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Abstract 

In  this  research,  face  recognition  and  speaker  identification  systems  are  each  con¬ 
verted  into  verification  systems.  The  two  verification  systems  are  then  fused  to  form  a 
single  identity  verification  system.  Finally,  the  use  of  the  Karhunen-Loeve  Tramsform 
(KLT)  for  dimensional  reduction  is  examined  for  suitability  in  the  verification  task. 

The  base  face  recognition  system  used  the  KLT  for  feature  reduction  and  a  back- 
propagation  neural  net  for  classification.  Verification  involved  training  a  net  for  each 
individual  in  the  database  for  two  classes  of  outputs,  ‘Joe’  or  ‘not  Joe.’  The  base  speaker 
identification  system  used  Cepstral  analysis  for  feature  extraction  and  a  distortion  measure 
for  classification.  Verification  in  this  case  involved  performing  the  KLT  on  the  Cepstral 
coefficients  and  then  classifying  using  a  two-dass  neural  net  for  each  individual,  similarly 
to  the  face  verifier  implementation. 

KLT  feature  reduction  is  compared  to  alternative  linear  and  non-linear  methods,  and 
the  KLT  is  found  to  provide  superior  performance.  The  fusion  of  the  two  base  verification 
systems  is  shown  to  provide  superior  performance  over  either  system  alone. 
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/.  Introduction 

The  automated  recognition  of  individuals  is  an  area  of  great  interest  to  both  the 
military  and  the  commercial  communities.  Instilling  such  a  capability  into  a  machine 
would  benefit  many  diverse  activities,  such  as  validating  the  identity  of  an  Automatic  Teller 
Machine  user  or  distinguishing  a  terrorist  within  a  bustling  airport  crowd.  A  recognition 
system  could  serve  as  vital  a  function  as  protecting  our  national  security  by  ensuring  only 
authorized  personnel  are  gramted  access  to  restricted  data  in  government  computer  systems. 
It  could  also  perform  as  trivial  a  function  as  recognizing  and  greeting  the  user  upon  startup 
of  the  latest  computer  game.  Table  1.1  provides  a  listing  of  possible  applications  (42). 

Though  humans  are  able  to  perform  individual  recognition  with  relative  ease,  the 
challenge  of  automating  this  function  has  been  daunting  researchers  for  over  four  decades. 
This  is  not  to  say  that  all  work  in  the  area  has  been  fruitless;  in  fact,  there  has  been 
considerable  success  in  finding  solutions  to  certain  elements  of  the  problem.  Unfortunately, 
even  with  those  successes,  there  is  currently  no  automated,  autonomous  system  capable  of 
accurately  and  consistently  identifying  individuals  in  real  time.  Such  a  system  would  find 
a  ready  market  in  today’s  world. 

In  the  remainder  of  this  chapter,  some  background  on  past  and  ongoing  research 
in  the  fields  of  face  and  speaker  recognition  will  be  provided,  as  well  as  a  statement  of 
the  problem  to  be  investigated  in  this  research.  Objectives,  assumptions,  and  known 
limitations  will  then  be  outlined,  followed  finally  by  a  description  of  the  methodology 
which  will  be  observed  in  the  performance  of  this  research. 

1.1  Background 

This  section  contains  a  brief  discussion  of  the  application  of  pattern  recognition 
techniques  to  face  recognition  and  speaker  identification  and  outlines  some  past  and  present 
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Table  1.1  Applications  for  Positive  Identification  Systems  (42) 


ACCESS  TO  RESTRICTED  AREAS 

Airport  Cargo,  Ticket  and  Baggage  Areas 
Airport  Control  Towers,  Refueling  and  Maintenance  Areas 
Embassies  and  Corporate  Offices  in  Foreign  Countries 
Nuclear  Facilities,  Conventional  Power  Stations  and  Grid 
Control  Stations 

Munitions  and  Hazardous  Materials  Storage  Areas 
Corporate  Archives  and  Computer  Centers 
Engineering  Labs 

Blood  Banks,  Tissue  Banks  and  Forensic  Labs 
Research  and  Development  Facilities 

ACCESS  TO  DISTRIBUTION  OP  GOODS  AND 
SERVICES 

Automatic  Teller  Machines  -  Cash 
Point  of  Sale  Terminals  -  Goods  and  Credit 
Welfare  Agencies  •  Food  Stamps  and  Cash 
Drug  and  Other  Clinics  -  Medication 
Computer  Networks  -  Electronic  Fund  Transfers 

ACCESS  TO  RESTRICTED  INFORMATION 

Company  Proprietary  Data,  Plans  and  Forecasts 

Government  Reports  and  Regulations  in  Progress 

Classified  Government  Files 

Financial  Securities  Transactions 

R  &  D  Technical  and  Business  Data 

Medical  and  Personnel  Records 

Patent  Applications 

WiUs  and  Personal  Papers 

Competitive  Proposals 
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research  in  the  field.  Further  details  regarding  this  research  will  be  provided  in  ihe  review 
of  the  literature. 

Pattern  recognition  is  an  area  of  research  pertaining  to  the  ability  of  a  system  (bi¬ 
ological  or  mechanical)  to  perceive  and  identify  certain  characteristics  of  some  target,  be 
the  target  an  image,  a  sound,  or  any  other  piece  of  data,  and  determine  the  classification 
of  the  target  from  those  characteristics.  The  human  brain  is  an  excellent  example  of  a 
biolo^cal  pattern  recognition  system.  For  instance,  we  find  it  very  easy  to  differentiate 
between  the  letter  ’Z’  and  the  number  ’2,’  though  both  these  symbols  contain  several  com¬ 
mon  chciracteristics.  Our  brains  provide  us  the  capability  of  selecting  those  characteristics 
that  make  the  symbols  different.  Such  characteristics  are  classically  know  as  features,  and 
the  success  of  a  pattern  recognition  system  depends  on  its  ability  to  extract  appropriate 
and  sufficient  features  to  perform  robust  classification.  As  with  other  recognition  systems, 
both  face  amd  voice  recognizers  must  use  features  that  allow  discrimination  of  one  target 
from  another,  yet  will  also  allow  recognition  of  different  occurrences  of  the  same  target. 

The  problem  of  pattern  recognition  has  historically  been  broken  down  into  three  sub¬ 
problems:  segmentation,  feature  extraction,  and  classification  (63).  Segmentation  involves 
determining  the  area  of  interest  in  a  collection  of  data;  that  is,  determining  what  region 
in  the  data  space  may  contain  targets.  Feature  extraction  involves  determining  which 
features  within  (or  derived  from)  the  data  set  will  be  used  to  perform  the  classification. 
In  the  classification  step  the  target  is  identified  as  belonging  to  a  certain  class  (or  in  some 
recognition  systems  may  be  identified  as  not  belonging  to  a  certain,  or  any,  class).  Figure 
1.1  pictorially  illustrates  an  elementary  pattern  recognition  system. 

1.1.1  Face  recognition.  The  segmentation  problem  in  face  recognition  has  tra¬ 
ditionally  attracted  the  lesist  attention,  and  in  most  research  efforts  the  target  faces  have 
been  manually  segmented;  in  other  words,  target  faces  within  images  were  manually  cen¬ 
tered  before  presentation  to  the  feature  extraction  mechanism.  In  an  Air  Force  Institute  of 
Technology  (AFIT)  thesis,  Kevin  Gay  tackled  the  segmentation  problem,  and  developed  a 
system  which  used  two  consecutive  frames  from  a  videotape  of  a  subject  to  determine  the 
location  of  the  face  within  an  image  (17).  His  technique  was  based  on  the  fact  that  the 
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Figure  1.1  A  basic  pattern  recognition  system. 

head  will  always  be  undergoing  some  relative  motion,  and  subtracting  one  frame  from  the 
other  wiU  specify  that  motion.  That  information  can  then  be  used  to  determine  where  the 
head,  and  thus  the  face,  can  be  found  within  the  image. 

Figure  1.2  presents  an  example  of  the  segmentation  process.  The  two  photographs  are 
consecutive  frames  from  a  videotape  sequence,  and  the  ‘Motion”  image  is  the  difference 
of  the  two  frames.  The  motion  image  is  filtered  to  remove  extraneous  noise,  and  then 
presented  to  an  algorithm  that  fills  in  all  pixels  below  any  detected  motion.  The  result  is 
a  mapping  in  space  that  defines  where  the  face  lies.  Note  that  this  application  depends 
on  a  cooperative  target  with  face  turned  toward  the  camera,  and  the  images  obtained  can 
only  contain  the  single  target.  These  are  not  limitations  if  trying  to  verify  an  individual’s 
identity  in  a  cooperative  situation,  but  adaptation  would  be  required  to  use  such  a  system 
with  a  non-cooperative  target  or  with  a  specific  target  within  a  crowd. 
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Figure  1.2  Example  of  segi^i^u^ation  process  using  Gay’s  system  (17). 


The  feature  extraction  problem  has  been  at:’cked  in  two  general  ways.  The  first 
methodology  relies  on  extraction  and  examination  of  specific  fadal  features  to  determine 
the  individual  to  whom  the  face  belongs.  Mannaert  and  Oosterlinck  proposed  a  repre¬ 
sentative  implementation  of  this  method,  basing  their  feature  discriminants  on  geometric 
proportions,  surface  properties,  and  iconic  features  of  the  face  (34). 
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The  second  method  uses  a  holistic  approach,  in  which  the  faces  are  examined  as 
a  whole.  With  this  technique,  it  may  be  appropriate  to  think  of  the  face  as  being  a 
feature  in  itself,  as  specific  features  within  the  face  image  are  not  extracted.  Turk  and 
Pentland,  at  the  Massachusetts  Institute  of  Technology,  have  been  strong  proponents  of 
this  methodology,  contending  that  “individual  facial  features  such  as  the  eyes  or  nose  may 
not  be  as  important  to  human  face  recognition  as  the  overall  pattern  capturing  a  more 
holistic  encoding  of  the  face”  (65).  They  have  developed  the  concept  of  “eigenfaces,”  which 
are  the  recognition  system’s  representations  of  the  variations  in  a  set  of  face  images.  This 
information  is  used  to  encode  and  compare  the  facial  features  of  individuals.  Researchers 
at  AFIT  have  implemented  a  neural  network-based,  holistic  recognition  system  relying  on 
the  eigenface  approach.  In  a  thesis  by  Ken  Runyon,  we  find  that  the  system  performs  fairly 
well  for  some  tests,  but  does  have  limitations,  such  as  a  marked  degradation  in  performance 
when  presented  with  a  target  image  that  was  obtained  one  or  more  days  after  the  image 
on  which  the  system  was  trained  (52). 

Gordon  recently  proposed  a  relatively  unique  approach  to  face  recognition:  using 
depth  and  curvature  features  of  the  face  to  determine  identity  (20).  The  strategy  is  similar 
to  the  standard  feature  extraction  of  Mannaert  and  Oosterling,  but  relies  on  a  completely 
different  feature  set  that  is  not  restricted  to  two  dimensions  in  space.  Gordon  developed 
and  implemented  a  system  at  Harvard  University  based  on  the  use  of  a  rotating  laser 
scanner  system  to  obtain  range  (depth  and  curvature)  data  from  subjects.  She  reports 
excellent  classification  capability  when  the  three-dimensional  features  are  presented  to  the 
recognizer.  Range  data  has  often  been  used  in  other  pattern  recognition  problems,  but 
only  recently  has  equipment  with  the  required  accuracy  become  available  at  acceptable 
cost. 


1.1.2  Speaker  Identification.  Automated  speaker  recognition  is  dependent  on 
features  found  within  acoustic  speech  signals  for  classification.  Many  speaker  recognition 
systems  use  linear  predictive  analysis,  a  method  in  which  the  speaker’s  speech  patterns 
are  used  to  develop  a  parametric  model;  new  instances  of  speech  can  then  be  compared 
against  this  model  to  determine  whether  they  match  the  model  to  some  degree  of  accuracy. 
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The  major  deficiency  of  this  approach  is  the  model’s  poor  performance  in  noise.  This 
occurs  because  some  of  the  primary  assumptions  under  which  the  model  was  developed  are 
violated  when  the  speech  signal  is  corrupted  by  noise  (39:165).  Another  approach  is  to  use 
an  auditory  model,  in  which  speech  is  converted  to  a  representation  of  the  auditory  nerve 
firing  patterns  found  within  the  human  aural  system.  In  other  words,  a  cochlear  model  is 
developed  based  on  the  workings  of  the  human  cochlea,  and  acoustic  speech  presented  to 
this  model  is  recoded  to  simulate  the  firing  of  nerves  along  the  basilar  membrane  within 
the  auditory  system.  Because  this  approach  attempts  to  emulate  a  system  that  is  known 
to  work  well  (human  hearing),  the  hope  is  that  more  robust  recognition  can  be  performed. 
In  an  AFIT  thesis  by  John  Colombi,  such  a  cochlear  model  is  implemented  and  compared 
to  the  traditional  Linear  Predictive  Coding  approach  (10). 

1.2  Problem  Statement 

In  the  course  of  this  research,  an  implementation  will  be  developed  that  fuses  deriva¬ 
tions  of  the  existing  AFIT  face  recognition  and  speaker  identification  systems  to  provide  a 
user  verification  capability.  The  bulk  of  the  research  will  be  concentrated  on  implementing 
this  fusion  and  determining  whether  the  technique  currently  being  used  to  extract  fea¬ 
tures  from  faces  is  appropriate  for  the  task.  Alternate  methods  for  feature  extraction  will 
be  explored.  The  system  performance  will  be  measured  using  two  metrics:  classification 
accuracy,  and  identification  speed.  The  classification  accuracy  will  be  a  function  of  the 
robustness  of  the  system,  and  will  be  based  not  only  on  identification  accuracy,  but  also 
on  rejection  inaccuracy;  that  is,  both  mistaken  recognition  and  mistaken  rejection  will  be 
considered.  Recognition  speed  will  be  based  on  the  perception  of  “acceptable”  time  for 
recognition.  This  is  a  somewhat  subjective  measure,  imposed  by  the  probability  that  a 
system  user  will  only  accept  a  recognition  system  if  it  performs  with  a  certain  relative 
speed.  Of  these  two  measurement  metrics,  classification  accuracy  will  be  considered  the 
more  important. 
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l.S  Research  Objectives 

The  primary  objectives  of  this  research  are  to  fuse  two  single-sensor  recognition 
systems  (face  and  voice)  into  a  single  identity  verification  system  and  to  attempt  to  improve 
the  classification  performance  of  the  face  identification  portion  of  the  fused  system.  The 
former  will  require  that  methods  be  developed  to  convert  the  individual  recognition  systems 
into  verification  systems,  and  then  that  the  outputs  of  these  verification  systems  be  fused 
in  some  probabilistic  space.  Improvements  to  the  system  will  be  attempted  by  determining 
the  appropriateness  of  the  current  feature  extraction  method  and  developing  new  ones  for 
testing. 

1.4  Assumptions 

•  Methods  can  be  developed  to  convert  the  existing  recognition  systems  (identifying 
an  individual  as  a  member  of  a  data  base)  into  verification  systems  (verifying  that 
the  individual  is  who  he/she  claims  to  be). 

•  The  outputs  of  the  individual  verification  systems  will  be,  or  will  be  directly  related 
to,  post  (posteriori)  probabilities.  This  will  allow  simple  probabilistic  fusing. 

•  This  research  will  depend  on  a  cooperative  subject.  The  orientation  of  the  head 
in  the  image  will  be  face  forward,  and  the  subject  will  be  alone  and  close  enough 
to  the  camera  to  allow  dther  mamual  or  automatic  segmentation.  The  subject  will 
speak  into  a  microphone  when  prompted,  and  the  acoustic  environment  will  not  be 
excessively  noisy. 

1.5  Scope  and  Limitations 

The  scope  of  this  research  will  be  limited  to  exploring  methods  of  converting  and 
fusing  the  two  recognition  systems  and  analyzing  the  use  of  an  alternate  feature  set  for 
presentation  to  the  face  verifier.  The  performance  may  be  limited  by  the  accuracy  of  the 
computational  and  image-capturing  hardware  provided. 


1.6  Approach/  Methodology 

Existing  software  will  be  modified  or  new  software  will  be  developed  and  implemented 
on  a  Sun  SPARCstation2  .  The  existing  face  segmentation  software  will  be  retained  in 
whole  or  in  part,  as  will  classification  algorithms  developed  during  previous  AFIT  face 
recognition  research  efforts.  The  face  recognition  system  used  will  extract  features  using 
the  Karhunen-Loeve  Transform  (KLT)  technique  as  outlined  in  Chapter  2  of  this  thesis, 
and  will  train  a  back-propagation,  multi-layer  perceptron  to  perform  the  classification 
task.  SpeaJcer  recognition  algorithms  developed  at  AFIT  will  also  be  retained  in  whole  or 
in  part;  Cepstral  analysis  will  be  used  for  for  feature  extraction  and  a  distortion  metric  for 
classification.  The  development  software  used  will  be  a  UNIX  implementation  of  ANSI  C. 
The  following  tasks  will  be  accomplished  during  the  course  of  the  research  effort: 

1.  Convert  the  existing  AFIT  holistic  face  recognition  system  into  a  holistically-based 
face  verification  system.  Perform  tests  using  this  system  to  develop  a  baseline  against 
which  to  measure  modified  or  newly  developed  systems  and  capabilities. 

2.  Convert  the  existing  AFIT  auditory  model-based  speaker  identification  system  into  a 
speaker  verification  system.  As  with  the  face  verifier,  perform  tests  using  this  system 
to  develop  a  baseline  against  which  to  measure  modified  or  newly  developed  systems 
and  capabilities. 

3.  Develop  the  algorithms  necessary  to  fuse  the  two  verification  systems  into  a  single 
user  verification  system. 

4.  Develop  the  algorithms  for  testing  and  analyzing  the  use  of  an  alternate  feature  set 
for  face  verification  based  on  a  calculated  Figure  of  Merit  (FoM). 

5.  Develop  the  algorithms  for  testing  and  analyzing  the  use  of  an  alternate  feature  set 
for  face  verification  based  on  nonlinear  dimensional  transformation. 

6.  Measure  the  performance  of  each  of  the  verification  systems  alone  and  the  fused 
verification  system  KLT/FoM-  based  face  feature  sets. 
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i.7  Conclusion 


The  systems  developed  during  this  research  effort  will  help  further  understanding  of 
the  face  recognition  and  speaker  identification  processes,  bringing  us  one  step  closer  to  our 
goal  of  constructing  a  reliable  and  accurate  mechanism  for  autonomous  identity  verifica¬ 
tion.  Some  uses  for  such  a  mechanism  were  briefly  mentioned  in  the  introduction  to  this 
chapter,  but  there  exist  many  other  activities  which  wiU  also  benefit;  indeed,  it  should  be 
expected  that  an  identity  verification  system  will  serve  purposes  of  which  we  have  not  yet 
conceived.  But  until  the  day  comes  that  such  a  capability  exists,  researchers  will  continue 
the  quest  for  increased  knowledge  about  the  mechanics  of  recognition  through  efforts  sim¬ 
ilar  to  the  one  to  be  undertaken  here. 


In  the  next  chapter,  we  shall  review  recent  and  current  research  into  the  areas  of  face 
recognition,  speaker  identification,  and  multi-sensor  fusion.  Methodologies  and  motivations 
for  user  verification  will  be  presented,  as  well  as  a  brief  survey  of  current  implementations. 
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II.  Literature  Review 


2.1  Introduction 

Security  issues  play  an  increasingly  important  role  in  both  the  long-  and  short-term 
operation  of  the  government.  The  ability  to  protect  data  within  the  electronic  confines 
of  a  computer  system  becomes  vital  as  one  considers  the  threat  to  national  security  that 
could  ensue  if  unauthorized  agents  were  ^ven  improper  access.  The  main  thrust  in  the 
area  of  computer  security  has  thus  been  to  develop  procedures  to  ensure  only  authorized 
users  are  permitted  access  to  important  data.  These  procedures  have  traditionally  ranged 
from  implementing  simple  password  protection  schemes  to  verifying  some  physical  device, 
such  as  an  access  card,  and  even  to  requiring  manual  identification  and  verification  of  users 
by  security  guards.  The  problem  with  the  first  method  is  that  passwords  can  be  “broken,” 
and  the  second  method  carries  with  it  the  assumption  that  the  access  device  will  always 
be  with  the  authorized  user.  The  third  method  requires  human  security  guards,  who  by 
nature  are  extremely  adept  at  providing  accurate  individual  authentication,  but  will  tend 
to  perform  increasingly  poorly  as  more  persons  are  added  to  the  database  of  authorized 
users  (9).  Therefore,  it  is  natural  that  we  look  for  some  method  of  automating  the  user 
verification  process,  thereby  providing  increased  security. 

In  this  review,  current  research  into  automated  identification  and  verification  of  in¬ 
dividuals  will  be  examined.  Research  being  performed  in  the  areas  of  recognition  and 
identification  of  faces  will  first  be  surveyed,  followed  by  a  review  of  efforts  in  the  speaker 
identification  arena.  Finally,  methods  used  to  fuse  the  outputs  of  multiple  sensor  systems 
will  be  presented. 

2.2  Face  Recognition 

The  recognition  of  a  familiar  face  is  something  we  humans  take  for  granted.  We 
perform  this  task  with  admirable  precision,  and  only  seldom  is  any  effort  involved.  But 
for  all  that  natural  skill,  no  one  really  understands  precisely  how  we  perform  recognition. 
Recent  research  indicates  the  process  of  face  recognition  involves  at  least  two  major  steps 
(21).  The  first  is  known  as  segmentation,  in  which  we  are  alerted  to  the  fact  that  there  is 


2-1 


a  face  within  our  visual  field.  At  this  stage,  recognition  of  the  person  as  an  individual  has 
not  occurred;  we  simply  notice  the  face.  In  the  next  step  we  actually  recognize  the  face  as 
being  familiar  to  us.  Efforts  to  mimic  these  biological  processes  with  machines  have  met 
with  varying  degrees  of  success,  and  in  the  following  sections  some  of  these  efforts  wiU  be 
outlined. 

2.2.1  Segmentation  of  Faces.  The  problem  of  determining  whether  or  not  a  face 
is  present  in  the  visual  field  has  not  been  extensively  addressed  in  the  literature.  Most  face 
recognition  research  has  assumed  the  face  is  known  to  be  there,  and  is  already  pre-processed 
(scaled,  rotated,  and  positioned)  for  introduction  to  the  recognition  mechanism  itself. 
Govindaraju,  Sher,  et  al,  studied  the  problem  of  locating  faces  in  newspaper  photographs, 
developing  a  geometric  model  of  the  prototypical  human  face  and  scanning  photographs 
for  images  approximately  representative  of  that  model  (21:p551).  The  technique  was  quite 
successful,  but  constraints  placed  on  the  problem  included: 

1.  Frontal  face  view  required  in  photograph. 

2.  Face  must  be  upright  with  negligible  tilt. 

3.  Faces  must  not  be  occluded  by  other  objects. 

4.  Face  must  be  at  least  some  minimum  size. 

5.  Image  must  have  some  minimum  resolution. 

6.  Number  of  faces  to  be  found  must  be  known. 

Though  these  restrictions  may  be  well  adapted  to  finding  faces  in  newspaper  pho¬ 
tographs,  they  are  not  likely  to  be  adequate  for  real-world  "face  in  a  crowd”  capturing. 

A  more  complex  pattern-matching  scheme  has  been  proposed  by  Seitz  and  Bichsel 
(57).  Their  approach  involves  performing  a  hierarchical  search  for  features  in  progressively 
finer  resolution  images  (images  containing  progressively  higher  spatial  frequency  content), 
with  the  assumption  that  at  different  levels  of  resolution,  different  features  will  be  more 
important.  For  instance,  at  low  resolutions  (about  two  Hz),  only  the  broad  outline  of  the 
head  is  searched  for,  a  horizontal  oriented  line  in  the  upper  part  of  the  image  and  two 
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vertical  lines  at  the  left  and  right  sides  of  the  image.  At  finer  resolutions,  the  nose  is 
localized,  and  then  the  eyes  and  pupils.  This  proved  to  be  a  relatively  robust  method,  and 
was  somewhat  invariant  to  rotation;  once  the  pupils  were  found,  the  planar  rotation  of  the 
head  could  be  calculated,  and  the  face  could  be  rotated  into  a  standard  position  for  further 
processing.  The  particular  application  presented  did  not  account  for  scaling  differences, 
but  the  authors  state  that  in  principle,  the  information  was  available  to  scale  the  face  for 
an  unknown  size. 

In  an  Air  Force  Institute  of  Technology  thesis  by  Kevin  Gay,  we  are  shown  that 
the  use  of  motion  analysis  may  be  a  suitable  technique  for  face  segmentation  (17).  His 
approach  was  to  capture  two  images  of  a  subject  from  a  fixed  camera  in  rapid  succession, 
then  perform  a  frame-to-frame  subtraction  to  determine  any  motion.  Based  on  the  fact 
that  humans  cannot  keep  their  heads  perfectly  still,  he  hypothesized  that  movement  in 
the  image  could  correspond  to  the  presence  of  a  face.  After  enhancing  the  motion  image 
(the  difference  between  the  two  frames),  it  was  analyzed  for  detection  of  a  face,  and  if 
one  was  found  it  was  cut  from  the  image  and  resized  to  create  a  standard  size  vector  for 
input  to  the  face  recognition  system.  This  approach  proved  to  be  fairly  successful,  but 
not  flawless.  Gay  found  that  the  outlines  of  the  motion  images  were  not  consistent,  but 
could  not  determine  the  cause  of  the  inconsistency.  He  felt  that  given  a  better  method  of 
finding  the  motion  image,  standardization  and  face  discrimination  capability  could  likely 
be  improved. 

2.2.2  Recognition  of  Faces.  There  are  currently  two  major  approaches  being 
examined  by  researchers  into  face  recognition  techniques.  The  first  relies  on  extraction 
and  examination  of  specific  facial  features  to  determine  the  individual  to  whom  the  face 
belongs,  while  the  second  extols  a  holistic  approach,  examining  the  face  as  a  whole. 

2.2.2. 1  Feature  Extraction.  Facial  Geometric  Proportion  Mannaert 
and  Oosterlinck  proposed  a  representative  implementation  of  the  first  method,  basing 
their  feature  discriminants  on  geometric  proportions,  surface  properties  and  iconic  features 
(34).  The  use  of  geometric  proportions  is  based  on  the  idea  that  certain  distances  within 
the  human  face  may  vary  between  people,  but  are  quite  invariant  for  the  same  person. 
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Examples  include  the  vertical  distance  from  the  eyes  to  the  upper  side  of  the  mouth  and 
the  horizontal  width  of  the  face  at  the  nose. 

Surface,  or  texture,  characteristics  depend  on  extraction  of  "smoothness”  information 
about  certain  areas  of  the  faw:e,  such  as  the  forehead  or  the  cheek,  and  are  calculated  by 
relating  the  mean  intensity  gradient  at  a  particular  region  to  the  entropy  of  the  histogram 
in  that  region.  Iconic  features  depend  on  characteristics  of  certain  subimages  of  the  original 
image.  The  shape  of  the  chin,  for  example,  could  be  a  valid,  discriminable  feature.  Another 
textural  proposed  by  the  authors,  but  not  applied  during  the  study  described,  is  a  measure 
of  the  standard  deviation  of  the  fractal  dimension,  to  determine  the  presence  of  hair. 

Once  the  features  were  extracted  from  the  image,  they  were  correlated  with  the 
features  of  all  faces  present  in  the  database.  Measures  of  similarity  and  Euclidean  distance 
were  accomplished,  and  the  system  identified  the  face  in  the  image  by  selecting  the  best 
match.  Preliminary  results  from  the  use  of  this  system  have  been  promising,  and  the 
authors  intend  to  continue  development  with  a  larger  database  of  faces,  more  ditierent 
sensors  for  feature  detection,  and  various  other  enhancements. 

Biologicsdly  Motivated  Feature  Extraction  A  somewhat  different  method  of 
feature  extraction  is  presented  by  Manjunath,  whereby  features  are  extracted  without  any 
assumptions  concerning  face  structure  (33).  His  work  is  biologically  motivated,  in  the 
sense  that  it  attempts  to  emulate  the  human  visual  system’s  ability  to  recognize  images 
that  don’t  necessarily  lend  themselves  to  simple  geometrical  representations. 

The  development  of  the  feature  detection  model  is  motivated  by  the  early 
processing  stages  in  the  visual  cortex  of  mammals.  The  cells  in  the  visual 
cortex  can  be  classified  into  three  broad  functional  categories:  simple,  complex, 
and  hypercomplex.  Of  particular  interest  here  is  the  end-inhibition  property 
exhibited  by  the  hypercomplex  cells.  This  property  refers  to  the  response  of 
these  cells  to  short  lines  and  edges,  line  endings,  and  sharp  changes  in  curvature 
(e.g.,  corners).  Since  these  correspond  to  some  of  the  low  level  salient  features 
in  an  image,  these  cells  can  be  said  to  form  in  some  sense  a  low  level  feature 
map  of  the  intensity  image  (33:p374). 

Similarly  to  Seitz  and  Bichsel’s  face  segmentation  work,  Manjunath  proposes  the  extrac¬ 
tion  of  oriented  feature  information  at  different  scales.  He  obtains  the  information  by  using 
Gabor  wavelet  transformations  on  the  original  intensity  image,  where  Gabor  functions  are 
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simply  Gaussians  modulated  by  complex  sinusoids.  The  wavelet  transformation  decom¬ 
poses  the  original  signal  into  a  linear  combination  of  basis  functions,  which  are  obtained 
from  simple  dilations  and  translations  of  a  "mother”  wavelet  (for  an  in-depth  treatment  of 
wavelet  transform  theory,  see  (8)).  The  decomposed  signals  represent  different  spatial  res¬ 
olutions,  and  the  information  contained  within  the  different  levels  can  be  used  to  localize 
curvature  changes.  An  example  of  such  features  found  by  this  method  is  seen  in  Figure  2.1, 
where  the  input  image  is  a  hand-drawn  hammer,  and  the  processed  image  shows  a  star  at 
each  location  of  changing  curvature.  Figure  2.2  shows  the  same  technique  applied  to  two 
face  images.  The  curvature  information  at  those  points  represents  the  features  within  the 
image,  and  an  appropriate  cost  function  is  used  to  determine  whether  two  different  feature 
maps  represent  the  same  face. 


Figure  2.1  Curvature  changes  found  via  the  wavelet  transformation  of  a  hand-drawn 
image  (33). 


Facial  Thermographic  Feature  Extraction  Prokoski,  et  al,  have  presented  an 
identification  system  based  on  the  extraction  of  facial  thermographic  features  (42).  They 
claim  that 


.  .  .  the  thermal  measurements  of  individuals  under  repeated  conditions  are 
highly  repeatable.  The  mean  and  standard  deviation  temperature  of  a  group 
of  individuals  over  a  period  of  several  months  were  30.8®,  -f/-  0.032®  C  with  a 
coefficient  of  variation  of  0.1 
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Figure  2.2  Manjonath’s  technique  applied  to  faces  (33). 

The  amount  of  information  contained  within  a  thermographic  image  of  a  face  is  quite  large, 
and  though  the  authors  have  not  performed  extensive  testing,  they  feel  the  information  is 
sufficient  to  discriminate  between  and  identify  individuals. 

Three-dimensional  Feature  Extraction  Jia  and  Nixon  propose  a  method  of  ex¬ 
tracting  profile  information  from  a  two-dimensional,  front-view  of  a  face,  and  using  that 
information  as  additions  to  a  standard  geometrically  based  feature  set  (25).  The  authors 
assume  the  position  of  the  eyes  within  the  image  have  been  located  with  a  high  degree  of 
accuracy,  and  thus  the  center,  vertical  line  of  the  face  can  be  found.  They  then  calculate 
the  intensity  projection  along  the  direction  of  that  line,  where  the  intensity  projection 
Pw(z)  of  image  f(x,  z)  along  the  line  z  in  direction  w  is  simply 

Pw(z)  =  Jj{x,y)dw 

Though  not  precisely  the  profile  of  the  face,  this  projection  represents  the  relation  between 
the  intensity  peaks  and  valleys  along  the  center  line  of  the  face.  Figure  2.3  shows  an 
example  of  this  intensity  extraction  applied  to  a  face.  Seven  series  of  feature  data  were 
derived  from  this  projection: 


1.  the  resampled  projection.  p(t). 

2.  the  autocorrelation  of  p(t). 

3.  the  Dyadic  autocorrelation  function  of  p(t). 

4.  the  Fourier  transform  of  p(t). 

5.  the  Walsh  transform  of  p(i). 

6.  the  Fourier  power  spectrum  of  p(i). 

7.  the  Walsh  power  spectrum  of  p(t). 


Figure  2.3  Intensity  projection  of  a  face  (25). 


The  authors  feel  that  these  features,  combined  with  traditional  geometrical  measure¬ 
ments,  provide  superior  discrimination  capability  to  feature  sets  consisting  of  geometrical 
measurements  alone.  Their  results  support  this  conclusion,  though  the  effect  of  hair  falling 
on  the  forehead,  beard  and  moustache  growth,  and  other  physical  changes  to  the  pseudo¬ 
profile  were  not  addressed. 

Gordon  has  proposed  a  face  recognition  methodology  based  on  the  extraction  of 
depth  and  curvature  features  from  a  face.  The  major  difference  between  this  method  and 
most  other  feature  extraction  methods  is  that  features  here  are  actually  based  on  depth 
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measurements,  and  not  on  intensity  values.  Because  intensity  based  image  descriptions 
depend  on  intensity  variations,  low  contrast  features  such  as  cheeks  and  foreheads  are  very 
difficult,  if  not  impossible,  to  describe.  This  approach  is  also  different  from  biologically 
motivated  ones,  because  "although  it  is  unlikely  that  humans  base  their  representation  or 
comparison  of  shape  on  the  accurate  perception  of  depth,  we  propose  the  use  of  depth  data 
because  at  our  current  state  of  technolc^  it  is  the  most  strsdght  forward  way  to  input  or 
record  complex  shape  information  for  machine  analy8is.”(20:p235)  A  rotating  laser  scanner 
was  used  to  extract  the  depth  information,  which  in  turn  generated  a  surface  embedded  in 
a  three-dimensional  space.  Curvature  measurements  across  the  surface  were  then  obtained 
and  templates  were  produced  corresponding  to  specific  physical  facial  features.  See  Figure 
2.4  for  an  example  of  the  curvature  maps  obtmned,  and  Figure  2.5  for  local  maxima/minima 
plots  derived  from  the  maps.  Face  identification  could  then  be  accomplished  by  simple 
template  matching  or  by  measuring  the  volumetric  difference  between  a  test  surface  and  a 
known  surface  when  both  were  normalized  with  respect  to  a  small  set  of  common  feature 
points. 
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Figure  2.4  Principle  curvatures  for  a  single  face:  magnitude  (a)  and  direction  (c)  of 
maximum  curvature,  magnitude  (b)  and  direction  (d)  of  minimum  curvature. 
Umbilic  points  are  marked  in  (c)  &  (d),-  filled  circles  are  points  with  positive 
index  and  open  circles  are  points  with  negative  index  (20). 
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Figure  2.5  (a)  Ridge  lines:  local  maxima  of  (k^ax  >  threshr),  and  (b)  valley  lines:  local 

minima  of  (ifcm»n  <  threshr).  (20). 

2. 2. 2. 2  Holistic  Recognition.  Turk  and  Pentland,  at  the  Massachusetts 
Institute  of  Technology,  have  been  quite  active  in  pursuit  of  the  application  of  holistic 
recognition  techniques  to  the  face  recognition  problem.  They  contend  that  "individual 
facial  features  such  as  the  eyes  or  nose  may  not  be  as  important  to  human  face  recognition 
as  the  overall  pattern  capturing  a  more  holistic  encoding  of  the  face”  (65).  This  approach 
leads  to  less  dependency  on  detailed  geometries,  and  may  lead  to  simpler  computational 
models  well  suited  to  use  in  certain  constrsuned  environments  such  as  offices.  In  their 
scheme,  face  images  were  decomposed  into  characteristic  feature  images,  called  eigenfaces. 
These  eigenfaces  occupy  unique  positions  in  what  is  known  as  face-space,  and  identification 
is  made  by  projecting  the  image  to  be  identified  into  the  face-space,  then  determining  the 
eigenface  that  is  closest  to  the  test  face.  'Birk  and  Pentland  claim  the  approach  has 
advantages  over  other  face  recognition  methodologies  in  speed  and  simplicity,  learning 
capacity,  and  insensitivity  to  small  or  gradual  changes  in  the  face  image. 

Fleming  and  Cottrell  developed  a  similar  system,  training  a  back-propagation  neural 
network  to  automatically  extract  holistic  features  from  face  images  and  save  them  as 
"holons,”  similar  to  the  eigenfaces  of  Turk  and  Pentland  (14).  The  resultant  network  could 
recognize  new  images  of  familiar  faces,  categorize  unknown  images  as  to  their  "faceness,” 
and  to  a  degree  categorize  faces  as  to  their  gender. 

Researchers  at  AFIT  have  also  implemented  a  neural  network  based,  holistic  recogni¬ 
tion  system  (52,  27, 17).  Because  the  face  verification  portion  of  the  system  to  be  developed 


for  this  thesis  depends  on  the  AFIT  implementation,  details  of  its  operation  will  be  pro¬ 
vided  in  Chapter  3.  In  a  thesis  by  Ken  Runyon,  we  find  that  the  system  performs  fairly 
well  for  some  tests,  but  does  have  limitations  (52).  The  major  one  was  in  recognition 
accuracy  over  multiple  days  of  testing.  That  is,  recognizing  an  image  of  a  face  taken  at 
a  different  time  than  the  image  the  system  was  trained  on.  This  was  overcome  to  a  large 
extent  by  training  the  system  with  images  taken  over  multiple  days,  which  allowed  the  net 
to  capture  a  more  general  "view”  of  what  the  face  looked  like. 

2.S  Speaker  Identification 

As  with  the  recognition  of  human  faces,  we  tend  to  take  for  granted  our  ability  to 
recognize  the  voice  of  a  familiar  person,  even  if  the  voice  has  been  altered  in  some  way 
(changes  in  pitch  or  changes  due  to  illness,  for  example).  Indeed,  according  to  Levinson 
and  Roe,  humans  are  unable  to  appreciate  the  difiiculties  that  speaker  recognition  poses 
for  a  computer,  since  humans  comprehend  speech  so  easily  (29).  Two  general  schemes 
are  generally  used  for  speaker  identification,  one  based  on  extraction  of  features  within  an 
utterance,  and  the  other  on  the  use  of  a  model  of  the  mammalian  auditory  system. 

2.3.1  Feature  Extraction.  Most  feature-based  speaker  identification  systems  rely 
on  some  preprocessing  of  the  acoustic  speech  signal,  selecting  features  which  attempt  to 
model  the  physical  makeup  of  an  individual’s  vocal  tract  and  using  those  features  to  build 
a  database  of  known  individuals  (39).  Many  preprocessing  methodologies  have  been  used 
in  the  past,  smd  Colombi  provides  an  excellent  synopsis  of  those  techniques  in  his  1992 
thesis  (9),  reproduced  here  in  Table  2.1. 

Once  the  features  have  been  extracted  from  the  acoustic  signal,  it  remains  to  classify 
them,  thereby  determining  with  what  probability  a  test  subject  belongs  to  some  class. 
Again,  Colombi  has  provided  a  summary  of  different  classification  techniques  used  over 
the  last  several  years  (Table  2.2). 

The  speaker  verification  system  to  be  implemented  as  part  of  this  thesis  effort  will  be 
based  on  the  recognition  system  developed  at  AFIT  by  Colombi.  Details  of  that  system. 
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Table  2.1  Preprocessing  Techniques  for  Speaker  Recognition  Feature  Extraction  (9) 


Feature 

Author  (Date) 

Comments 

Filterbanks 

Pruzansky  (1963, 

1964) 

lOOHz  -  lOKHz,  various  averages  of  (and 
between  several)  filterbank  outputs  over 
time  were  examined  (39). 

Spectral  Characteristics 

Wolf(1972) 

Nasal  consonants,  fricatives,  v  owels,  pitch 
and  vowel  duration  (39). 

Pitch  Contours 

Atal(1972) 

Karhunen-Loeve  transform  on  pitch  con 
tours  (39). 

Filterbank  Correlation 

Li  and  Hughe8(1974) 

Correlations  among  filterbank  en  ergies 
(39). 

LPC  Cepstral 

Atal(1974, 1976) 

Comparison  to  log-area  ratios,  correlation 
coefficients,  LPC  coefficients  (2,  1). 

Spectral  Characteristics 

Sambur(1975) 

Formant  frequencies,  LPC  Poles,  pitch, 
some  temporal  patterns  (39). 

Formants 

Goldstein  (1976) 

Vowels,  199  ranked  features  (39). 

Linear  Prediction 

Sambur  (1976) 

LPC,  reflection,  log-area  ratios,  found  or¬ 
thogonal  reflection  coefficients  best  (least 
significant  projections)  (39). 

Long-Term  Statistics 

Markel  (1977,  1979) 

Mean  and  standard  deviation  of  pi  tch, 
reflection  coefficients  (39). 

Mel  Cepstral 

Davis  and  Mermelstein 
(1980) 

Cosine  expansion  of  the  spectrum,  com¬ 
parison  to  linear  and  LPC  cepstral  (13). 

Delta  Cepstral 

Furui(1981) 

Polynomial  expansion  over  time  (15). 

Log  Area  Ratios 

Schwartz(1982) 

Examined  different  classifiers  using  spec¬ 
tral  log  area  ratios  (56). 
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Table  2.1  (cont’d)  Preprocessing  Techniques  for  Speaker  Identification  Feature 


Ebctraction 


Feature 

Author  (Date) 

Comments 

LPC  Cepstral 

Oglesby  and  Mason 
(1990) 

10th  order  LPC  derived  cepstral  (37). 

Line  Spectral  Pair 

Liu(1990) 

Several  variants  of  LSP  -  Even,  Odd, 
Mean  and  Difference  of  LSPs  (31). 

Mel  Cepstral  and  LPC 

Bennani  (1990) 

12th  order  LPC  and  Mel  Frequency  Cep¬ 
stral,  based  on  24  triangular  filters  (4). 

LPC  Cepstral 

Gaganelis  and  FYan- 
goulis  (1990) 

10th  order  LPC  (16). 

Delta  LPC  Cepstral 

Furui  (1991) 

LPC  cepstral,  first  order  regression  every 
88  msec  period  (35). 

Delta  Cepstral  /Cepstral 

Rjosenburg  (1990, 

1991) 

12th  order  cepstral  and  delta-  cepstral 
coefficients,  weighted  using  a  sinusoidal 
"lifter”  (48,  49). 

Mel  Cepstral 

Oglesby  and  Mason 
(1991) 

12  filterbanks,  Mel  frequency  spaced  (38). 

Eigenvector  Analysis 

Bennani (1991) 

LPC  and  Mel  cepstrum  covariamce,  mean 
and  two  eigenvectors  (3). 

Filter  banks 

Higgins  (1991) 

Power  output  of  14  uniformly  spaced  fre¬ 
quency  banks  (23). 

Auditory  Model 

Hattori  (1992) 

Seneff  auditory  model  mean  rate  response, 
40  channels  (22). 

Delta  Cepstral  /Cepstral 

Tseng  et  al  (1992) 

Linear  combination  of  cepstral  and  delta 
cepstral.  Found  cepstral  alone  performed 
better  recognition  (64). 

LPC  Cepstral 

Savic  and  Sorensen 
(1992) 

20th  order  cepstral  derived  from  only  12th 
order  LPC  (54). 
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Table  2.2  Classification  Techniques  for  Speaker  Recognition  (9) 


Classifiers 

Author  (Date) 

Speakers,  ID  %,  Comments 

Distortion 

Atal (1974) 

10  speakers,  98%  identification,  Ma- 
halonobis  Distance  using  pooled  intra 
speaker  covariance  (2). 

DTW 

Furui  (1981) 

20,  Dynamic  Time  Warp  distortion 
measurement  on  fixed  sentences  (15). 

K-means,  Gaussian 

Estimation 

Schwartz  (1982) 

Compared  Gaussian  classifiers  to  K- 
means  and  Mahalonobis  Distance, 
non- parametric  outperformed  (56). 

HMM 

Poritz  (1982) 

Application  of  5  state  ergodic  HMM 
to  speaker  verification  (40). 

VQ 

Soong  (1985) 

First  Speaker  dependent  codebooks, 
voiced  and  unvoiced  speech  (59). 

VQ 

Soong  (1988) 

2  Codebooks,  1  instantaneous  and  1 
temporal  (60). 

MLP 

Oglesby  and  Mason  (1990) 

10,  92%,  Backprop  learning,  single 
layer  with  16  -  128  hidden  nodes. 
Equal  recognition  to  VQ  85.1.10. 

K-means/  LVQ 

Bennani  et  al  (1990) 

10,  95  -  97%  (4). 

HMM 

Rosenburg  et  al  (1990) 

20,  98.8  -  99.1%,  Used  k-means  to  seg¬ 
ment  the  utterance  into  acoustic  seg¬ 
ment  units,  also  examined  phoneti¬ 
cally  labeled  speech  (48). 

HMM 

Savic  and  Gupta  (1990) 

43,  97.8%,  5  HMM  models  represent¬ 
ing  broad  classes  (55). 

GMM 

Rose  and  Reynolds  (1990) 

12, 89%,  Only  1  sec  of  test  speech  (46). 
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Table  2.2  (cont’d)  Classification  Techniques  for  Speaker  Recognition 


Classifiers 

Author  (Date) 

Speakers,  ID  %,  Comments 

Binary  Partition 

Rudasi  and  Zahorian  (1991) 

47,  100%,  TIMIT  corpus,  need  N(N- 
l)/2  binary  MLP  classifiers.  (51) 

RBF  NN 

Oglesby  and  Mason  (1991) 

40 , 89%  true  talker,  different  manner¬ 
isms  of  speech.  (38) 

GMM 

Rose  et  al  (1991,1992) 

10,  77.8%,  Integrated  noise  model  into 
GMM,  GMM  on  Original  clean  speech 
-  99.5  (46,  47)  %. 

Discriminator 

Counting 

Eiggins  and  Bahler  (1991) 

24,  80%  true  talker,  KING  cor¬ 
pus,  multivariate  gaussian,  count 
wins/speaker  sununed  over  frames. 

VQ 

Matsui  and  Furui  (1991) 

9,  98.5  -  99.0  %,  Voice/Unvoiced  or  2- 
state  EMM,  New  Distortion  measure 
(DIM),  Talker  variability  normaliza¬ 
tion  (TVN)  individually  weights  fea¬ 
tures.  (35) 

EMM 

Rosenburg  (1991) 

20,  96.5  -  99.7%,  Whole  word  L-to-R 
EMM,  text  dependent  (digits),  com¬ 
pared  to  VQ.(49) 

Time  Delay  NN 

Bennani  and  Gallinari  (1991) 

20,  98%,  First  a  Male  /  Female 
TDNN,  then  a  10  output  (speadcers) 
TDNN  using  2  hidden  layers  (hierar¬ 
chical).  (3) 

EMM,  VQ,  ANN 

Eattori  (1992) 

24,  100  %,  TEMIT  corpus  (fe¬ 

males),  Predictive  NN  (recurrent) 
within  EMM,  compared  to  VQ  and 
MLP  claissifiers.  (22) 

CPAM  (GMM) 

Tseng  et  al  (1992) 

20, 98.3%  identification,  CPAM  -  Con¬ 
tinuous  Probability  Acoustic  Map, 
mixtures  of  Gaussian  kernels  with  and 
without  EMM.  (64) 

MLP 

Gong  and  Eaton  (1992) 

72,  89  -  100%,  Trained  MLP  to  in¬ 
terpolate  between  speaker  utterances 
(phoneme),  needs  labeled  speech 
(vowels). 

VQ 

Kao  et  al  (1992) 

26  (51),  93.3%  (67.6),  KING  corpus, 
11  broaul  class  codebooks  of  10  vec¬ 
tors,  Needs  labeled  speech.  (26) 
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as  well  as  its  modification  to  perform  the  verification  function,  will  be  provided  in  Chapter 
3  of  this  document. 


2.J^  Multiple  Sensor  Fusion 

Humans  are  able  to  quite  easily  integrate  information  from  different  senses  (hearing 
Sind  sight,  for  example)  and  make  decisions  based  on  that  integration.  Combining  informa¬ 
tion  in  machines,  however,  can  be  somewhat  more  problematic.  This  task  of  automating 
the  integration  of  multiple  sensors  is  commonly  known  as  sensor  fusion,  and  is  defined  by 
Thomoupoulous  as 

.  .  .  the  process  of  integrating  raw  and  processed  data  into  some  form  of 
meaningful  inference  that  can  be  used  Intelligently  to  improve  the  performance 
of  the  system,  measured  in  any  convenient  and  quantifiable  way,  beyond  the 
level  that  any  one  of  the  components  of  the  system  separately  or  any  subset  of 
the  system  components  partially  combined  could  achieve.  (62) 

Three  general  schools  of  thought  exist  for  the  fusion  of  information  from  multiple 
sensors,  and  are  described  in  the  following  sections  and  illustrated  in  Figure  2.6  (28). 
Fusion  of  Observations  Each  individual  sensor  i  provides  an  observation  vector  Si  to  a 
centralized  decision  unit  D  that  determines  the  decision  probability 

q  =  P{u  =  l|si,---,s„)  =  D(si,--  -,s„), 

where  u  is  the  class  being  considered,  q  is  provided  to  a  decider  within  D  that  will  make 
the  classification  decision. 

Fusion  of  Decisions  With  this  method,  each  sensor  i  is  provided  with  its  own  forecaster. 
Hi,  that  determines  the  single  sensor-based  probability 

I,  =  P{u  =  l|sj)  =  Hi{si). 

Xi  is  then  mapped  by  a  decider  /?,•  into  a  binary  decision  vector  (oj,-  •  •,a„),  which  is 
presented  to  a  fusion  rule  prescribing  the  final  decision.  Reibman  and  Nolte  (44,  43)  and 
Chair  and  Vaxshney  (7)  have  proposed  methods  to  optimize  both  the  fusion  rule  and  the 
individual  sensor  decision  unit  rules. 
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Fusion  of  Probabilities  As  with  the  second  method,  each  sensor  i  is  provided  with 
it’s  own  forecaster  Hi  that  maps  an  observation  vector  into  a  classification  probability 
Xi  =  P{u3  =  l|sj)  =  Hi{s,).  The  resultant  vector  of  the  individual  probabilities  is  then 
input  into  a  fusion  rule  H  that  provides  the  posteriori  fused  classification  probability 


a.  _ _ _  Centralized  Decision  Unit 


e. 


Figure  2.6  Three  methods  of  fusing  information  from  multiple  sensors:  a)  Fusion  of  Ob¬ 
servations,  b)  Fusion  of  Decisions,  c)  Fusion  of  Probabilities 

2.5  Conclusion 

This  search  of  the  current  literature  has  briefly  outlined  research  efforts  into  the  ar¬ 
eas  of  face  recognition,  speaker  identification,  and  multi-sensor  integration  or  fusion.  The 
latter  two  areas  have  received  significant  attention  over  the  last  few  years,  and  research 
into  the  first  is  becoming  increasingly  prevalent.  We  have  found  that  identification  systems 


based  on  either  faces  or  speech  have  seen  considerable  success,  but  that  all  of  the  problems 
inherent  to  such  a  task  have  not  yet  been  solved.  We  have  not  found  any  attempt  to  com¬ 
bine  the  capabilities  of  face  and  speech  recognition  systems  into  a  single,  cohesive  unit, 
but  have  seen  that  the  general  problem  of  fusing  information  from  multiple  sources  has 
been  addressed  and  successful  solutions  have  been  developed;  such  methodologies  should 
be  applicable  to  the  identity  verification  problem  being  addressed  by  this  thesis. 


The  next  chapter  shall  present  the  methodology  to  be  followed  in  the  performance 
of  this  research,  and  Appendix  A  provides  the  actual  implementation  techniques  used  and 
the  software  developed  and  modified  for  this  effort. 
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III.  Methodology 


3.1  Introduction 

The  system  to  be  developed  in  this  research  is  based  to  a  large  degree  on  previous 
and  concurrent  AFIT  thesis  efforts.  The  fundamental  building  blocks  include  a  neural-net 
based  face  recognizer,  detailed  in  theses  by  Krepp,  Runyon,  and  Gay  (27,  52,  17),  and 
a  distortion  based  speaker  identifier,  described  by  Colombi  (9).  These  systems  will  be 
modified  to  perform  the  verification  task,  and  wiU  then  be  fused  to  form  a  multiple  sensor 
verification  mechanism.  Techniques  to  enhance  the  operation  of  the  face  verifier  portion 
of  the  system  will  be  explored  in  this  thesis;  efforts  to  improve  the  speaker  verifier  wiU  be 
addressed  in  a  collateral  thesis  by  Prescott  (41). 

The  remainder  of  this  chapter  will  be  organized  as  follows:  First,  the  basic  identifi¬ 
cation  systems  will  be  described,  followed  by  a  description  of  the  efforts  needed  to  modify 
the  systems  to  function  as  verifiers.  Next,  the  technique  to  be  used  to  fuse  the  disparate 
verification  systems  will  be  presented,  and  finally  methods  to  improve  the  performance  of 
the  overall  system  will  be  addressed. 

3.2  Verification  Building  Blocks 

3.2.1  Face  Recognizer.  The  face  recognizer  used  in  this  research  is  based  on  the 
system  developed  by  Turk  and  Pentland  at  the  Massachusetts  Institute  of  Technology,  as 
well  as  work  conducted  at  AFIT  by  Suarez,  Goble,  et  al.  (65,  61,  18,  19,  27,  52).  An 
n  X  n  image  of  a  face  is  converted  into  a  vector  of  length  n^,  and  this  high  dimensional 
vector  is  then  projected  into  a  lower  dimensional  space  via  the  Karhunen-Loeve  Transform 
(KLT).  The  coefficients  describing  this  new,  reduced  vector  are  then  presented  to  a  back- 
propagation  neural  network  for  classification. 

3. 2. 1.1  Segmentation.  The  segmentation  portion  of  the  system  has  been 
approached  in  two  fundamental  ways.  In  the  first,  the  faces  within  the  image  are  manually 
segmented,  producing  a  very  constant  placement  of  the  face  within  the  image.  When  using 
this  method,  the  assumption  is  that  a  technique  wiU  be  made  available  at  some  point 
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automate  this  placement,  a  non-trivial  task.  The  second  method  addresses  this  subject, 
and  involves  a  frame  differencing  technique  explored  by  Gay  (17).  Using  the  notion  that 
no  one  is  able  to  keep  his  or  her  face  perfectly  still,  two  successive  image  frames  of  the 
target  are  captured,  and  one  is  subtracted  from  the  other.  The  resultant  motion  image  is 
scanned  for  head  shapes,  and  if  one  is  found,  it  will  be  used  as  a  template  to  locate  the 
face  within  the  image.  The  located  face  will  then  be  enlarged  and  moved  to  a  standard 
position  in  the  image. 

McCrae  has  explored  an  additional  segmentation  methodology  in  a  concurrent  thesis 
(36).  She  uses  a  neural  net  based  color  segmentation  scheme  to  detect  faces  within  a  color 
image  by  discriminating  between  ‘face’  color  and  ‘non-face’color;  she  is  then  able  to  detect 
the  eyes  within  the  faces  using  a  similar  approach.  An  example  of  face  segmentation  and 
eye  detection  using  the  color  discrimination  system  is  provided  in  Figure  3.1. 


Figure  3.1  Eye  point  detection  using  McCrae’s  system  (36). 

In  the  course  of  this  research  we  also  briefly  examined  face  segmentation  using  a 
three-dimensional  temporal  wavelet  introduced  by  Burns  in  his  Doctoral  dissertation  (6). 
Such  a  wavelet  is  able  to  detect  motion  within  a  sequence  of  two-dimensional  images  by 
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analyzing  the  frequency  content  over  time  (the  succession  of  images)  and  space,  rather  than 
strictly  space.  Figure  3.2  shows  one  of  the  original  source  images  from  a  sequence  showing 
a  person  nodding  their  head,  and  Figure  3.3  demonstrates  the  result  of  performing  one  level 
of  a  wavelet  decomposition.  3.3a  shows  the  approximation  image,  and  3.3b,  3.3c,  and  3.3d 
show  ,  respectively,  the  horizontal,  vertical,  and  di^onal  high  frequency  components  of  the 
images  over  time.  This  technique  appears  to  do  a  good  job  of  decreasing  the  importance 
of  the  background,  and  holds  great  promise  in  the  motion  segmentation  arena. 
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c)  d) 

Figure  3.3  Result  of  one  level  of  temporal  wavelet  decomposition. 

S.2.1.2  Feature  Reduction  via  the  Karhunen-Loeve  Transform.  When  work¬ 
ing  with  an  n  X  n  image,  the  holistic  recognition  approach  has  led  to  a  traditional  feature  set 
consisting  of  the  intensity  value  of  each  of  the  pixels  within  the  image.  A  problem  with 
this  approach,  however,  is  that  we  are  left  with  a  feature  vector  of  dimensionality  n^,  which 
may  become  extremely  unwieldy  as  we  attempt  to  perform  some  training/classification  pro¬ 
cess.  Therefore,  it  behooves  us  to  find  an  intelligent  method  to  reduce  that  dimensionality 
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without  sacrificing  the  information  inherent  to  the  original  feature  set.  Of  equal  impor¬ 
tance,  we  would  like  to  find  some  way  to  determine  which  of  these  reduced  features  are 
most  important;  that  is,  which  features  best  differentiate  images  of  different  targets^ 

Certain  data  transformations  have  been  developed  that  provide  us  with  methods 
of  identifying  which  feature,  or  combination  of  features,  best  allow  such  differentiation. 
In  classification  problems  where  one  does  not  know  the  various  probability  densities  of 
the  classes  being  studied,  methods  of  orthogonal  expansion  have  proven  quite  useful  in 
providing  a  new  feature  space  in  which  to  project  the  existing  features  (63:269).  A  Fourier 
series  expansion  allows  such  a  projection  of  periodic  processes,  but  certain  conditions  must 
be  met  to  use  them  with  nonperiodic  processes  such  as  our  faces^.  The  Karhunen-Loeve 
Transform  (KLT),  on  the  other  hand,  allows  a  non-periodic  random  process  to  be  expressed 
as  a  series  of  orthogonal  functions  with  uncorrelated  coefficients,  and  will  prove  useful  in 
our  classification  problem. 

While  the  basis  set  of  the  Fourier  Transform  consists  of  sines  and  cosines,  the  basis 
set  of  the  KLT  is  made  up  of  the  eigenvectors  of  the  covariance  matrix  of  the  source  data. 
The  derivation  is  relatively  straightforward,  and  can  be  stated  in  words  as  follows: 

1.  Compute  the  mean  and  covariance  functions  from  the  population  of  training  ims^es. 

2.  Compute  the  eigenvalues  of  the  statistically  normalized  covariance  matrix,  where 
statistically  normalized  simply  means  the  mean  image  has  been  subtracted  from 
each  individual  image  in  the  training  population. 

3.  Calculate  the  eigenvector  corresponding  to  each  eigenvalue  computed  in  the  previous 
step. 

4.  To  reduce  the  dimensionality  of  the  original  images  to  some  value  k,  select  the  k 
eigenvectors  corresponding  to  the  k  largest  eigenvalues  and  matrix  multiply  each 

'We  should  mention  at  this  point  that  if  we  are  performing  classification  using  neural  networks  (to 
be  discussed  in  the  next  section),  Ruck,  et  al,  have  developed  a  saliency  measure  to  determine  the  input 
features  with  the  greatest  influence  on  the  output  (50).  Our  purpose  here  is  to  determine  these  features 
before  attempting  classification. 

^If  an  assumption  is  made  that  the  non-periodic  sequence  under  study  is  actually  one  period  of  a  periodic 
sequence,  Fourier  techniques  can  be  used.  Goble  showed  that  under  such  an  assumption,  the  Discrete  Cosine 
Transform  was  indeed  effective  in  providing  a  recognition  capability  (18). 
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image  vector  by  the  eigenvectors.  The  k  eigenvectors  form  the  new  KLT  basis  set, 
and  any  dimensional  image  may  now  be  reduced  to  k  coefficients  of  the  basis  set. 


Let’s  now  look  at  the  specifics  of  the  KLT  process.  First  define  the  dimensional  image 
as  a  feature  vector 

/  \ 


\  / 


where  x*  is  the  tth  image  within  the  population.  The  mean  vector  of  the  training  population 
is  defined  as 


and  the  covariance  matrix  as 


1  M 

c,  =  — ^x*x* 


•=i 


After  subtracting  the  mean  image  from  all  the  source  images,  we  are  left  with  a 
covariance  matrix  of  the  form 


T*  r*  ^ 

2^i=l  *l*n* 


\  Ef=l  E<=1  '  ■  ■  Et=l 


This  covariance  matrix  provides  us  with  a  measure  of  the  importance  of  each  dimen¬ 
sion  within  the  original  set  of  images.  A  high  variance  in  a  single  dimension  across  all 
images  indicates  a  large  amount  of  differentiability  information  in  that  dimension;  con¬ 
versely,  a  small  variance  in  a  dimension  indicates  only  a  small  amount  of  information. 
Our  goal,  then,  is  to  maximize  the  values  in  the  diagonal  of  the  covariance  matrix  (corre¬ 
sponding  to  single-dimension  variance),  and  minimize  the  variation  in  all  other  locations 
(corresponding  to  co-dimensional  variance),  thereby  orthogonalizing  the  multi-dimensional 
space.  This  can  be  accomplished  by  finding  the  eigenvectors  (and  eigenvalues)  of  the  co- 
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variance  matrix,  thus  transforming  the  matrix  into  an  orthc^onal  space  and  ensuring  an 
optimal  distance  between  dimensions  (optim^ty  is  implied  by  orthogonality  in  this  case). 

We  can  then  rank  the  eigenvectors  in  descending  eigenvalue  order,  placing  them  in 
the  eigenmatrix 

®ai  **■  cjti 

Cln*  •  C*n> 

This  matrix  is  our  new  basis  set,  with  k  orthogonal  dimensions  into  which  any  n’ 
image  can  be  transformed.  This  transformation  is  accomplished  by  the  simple  process 

y*  =  (x*  -  m,)ut,  k  =  1 . .  .Af 

where  yt  represents  the  projection  of  the  oripnal  image  into  our  new  eigenspace.  Note 
that  the  selection  of  k  is  usually  dependent  upon  the  desired  reconstruction  accuracy. 
Krepp  found  that  32  x  32  face  images  were  sufficiently  represented  by  twenty  KLT  co¬ 
efficients  for  classification  to  an  accuracy  of  97.0%  (based  on  300  images  of  10  different 
people);  this  implementation  out-performed  the  system  when  using  all  pixel  values,  where 
an  accuracy  of  93.3%  was  attained.  Therefore,  he  was  able  to  reduce  an  =  1024  dimen¬ 
sional  feature  vector  into  one  of  only  20  dimensions  and  still  obtain  excellent  classification 
performance.  This  is  the  feature  extraction  approach  used  for  the  base  face  recognizer 
building  block,  but  because  we  are  not  necessarily  looking  for  optimal  face  verification 
performance  (we  are  instead  more  interested  in  the  performance  resulting  from  the  fusion 
of  two  verification  systems)  we  will  use  a  reduced  number  of  KLT  coefficients.  That  number 
will  be  determined  as  part  of  the  testing  process,  to  be  elaborated  on  in  the  next  chapter. 

3. 2. 1.3  Neural  Net  Classification.  The  use  of  artificial  neural  networks 
for  face  recognition  is  not  uncommon  in  recent  research,  and  networks  have  indeed  been 
found  to  function  quite  effectively  in  the  face  classification  task  (5,  53, 14,  11,  27,  52).  An 
excellent  treatment  of  the  history,  development,  and  function  of  artificial  neural  networks 
is  provided  by  Rogers  and  Kabrisky,  and  should  be  referenced  for  an  in-depth  examination 


of  the  subject  (45).  We  shall  provide  a  brief  look  here  at  the  reasons  for  using  a  neural  net 
for  classification,  and  describe  the  network  developed  for  the  face  recognizer  used  in  this 
research. 


The  artificial  neural  network  is  meant  to  emulate,  in  some  sense,  the  function  of  the 
biological  brain.  Nodes  are  developed  that  model  the  neurons  within  the  brain,  permitting 
some  functional  transformation  on  the  inputs,  and  weighted  interconnections  are  estab¬ 
lished  to  provide  communication  between  the  nodes,  somewhat  analogously  to  the  way 
in  which  the  biolo^cal  neural  system  provides  communication  between  neurons  via  axons 
(neural  outputs)  and  dendrites  (neural  inputs).  Figure  3.4  illustrates  the  implementation 
of  a  single  node  of  a  neural  network  and  shows  the  multiple-input,  single-output  nature 
of  the  artificial  (and  biolo^cal)  neuron.  This  particular  implementation  is  also  known  as 
the  single  layer  perceptron,  and  provides  as  its  output  a  function  of  a  linear  combination 
of  the  weighted  inputs,  with  weights  indicated  by  Wi  through  w„,  and  a  threshold  bias 
Figure  3.5  shows  a  complete  neural  network,  with  an  input  layer,  an  output  layer,  and  one 
‘hidden’  layer. 


1 

Figure  3.4  A  single  artificial  neuron  (perceptron). 
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Outputs 


Figure  3.5  A  fully  connected,  three  layer  neural  net. 

A  major  advantage  of  artificially  implementing  such  a  system  is  that  the  transforma¬ 
tion,  or  activation,  within  each  node  can  be  selected  by  the  user,  allowing  one  to  effectively 
implement  any  mathematical  function  desired.  This  ability  becomes  vital  as  one  attempts 
to  perform  classification  of  data  that  is  not  linearly  separable;  Cybenko  has  shown  that, 
given  sufficient  nodes  in  a  single  hidden  layer  and  non-linear  activations  within  those  nodes, 
any  continuous  function  can  be  approximated  (12).  This  implies  that,  given  sufficient  re¬ 
sources,  amy  separable  data  set  may  be  properly  classified  by  such  an  artificial  neural 
network. 

The  basic  function  of  a  neural  network  is  conceptually  fairly  straightforward,  and 
generally  relies  on  a  training  process  to  determine  the  appropriate  setting  of  the  intercon¬ 
nection  weights.  Once  the  net  has  been  trained  to  produce  a  specific  residt  when  presented 
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some  family,  or  class,  of  data,  our  goal  is  to  be  able  to  present  new  data  to  the  net,  and, 
if  this  new  data  belongs  to  one  of  the  classes  previously  trained  on,  properly  classify  the 
new  pattern  based  on  the  net  output. 

During  the  training  process,  pattern  vectors  are  presented  one  at  a  time  to  the  input 
nodes  of  the  net  and  allowed  to  propagate  through  the  interconnection  weights,  any  hidden 
nodes  (and  their  associated  transformations),  and  Anally  through  the  set  of  output  nodes 
(and  their  transformations).  The  output  values  produced  for  each  vector  are  compared 
to  a  desired  set  of  output  values  (determinable  because  we  know  the  classes  to  which  our 
training  vectors  belong)  and  the  net  weights  are  updated  based  on  the  difference  between 
the  desired  and  actual  outputs.  The  update  may  occur  after  each  vector  is  presented,  or 
in  a  batch  manner,  after  all  the  training  data  are  presented.  This  process  is  repeated  until 
the  error  between  the  actual  and  desired  outputs  is  reduced  to  some  desired  level.  At 
this  point,  all  the  interconnection  weights  are  saved,  and  when  we  wish  to  classify  a  new 
data  pattern,  we  rebuild  the  net  using  the  saved  weights,  propagate  the  unknown  pattern 
through  the  net,  and  examine  the  output. 

A  simple  example  would  be  illustrative  at  this  point.  Consider  training  a  net  on 
two  classes  of  data  using  multiple  sample  vectors  from  each  class.  Because  we  only  have 
two  classes  to  differentiate,  we  can  establish  an  output  layer  consisting  of  just  two  nodes. 
The  input  layer  will  be  composed  of  the  same  number  of  nodes  as  there  are  features  (also 
known  as  dimensions)  in  the  input  pattern,  and  the  hidden  layer(s)  may  contain  varying 
numbers  of  nodes  If  the  net  is  presented  a  pattern  from  Class  1,  we  wish  to  produce  an 
output  value  of  1  from  output  node  1,  and  0  from  output  node  2.  When  a  pattern  from 
Class  2  is  presented  we  wish  to  see  the  converse:  a  value  of  0  from  output  node  1  and  1 
from  output  node  2.  The  difference  between  these  desired  outputs  and  the  actual  outputs 
are  calculated,  and  the  weights  are  updated  using  some  learning  rule. 

Once  we  have  reduced  the  differences  (across  all  training  patterns)  to  a  satisfactory 
level,  we  can  present  a  previously  unseen  pattern  from  one  of  the  two  classes  to  the  net; 
a  new  pattern  belonging  to  a  particular  class  should  produce  outputs  similar  to  those 

^Determination  of  the  appropriate  number  is  not  always  apparent;  a  discussion  of  the  subject  is  presented 
in  Appendix  A  of  (45) 


produced  by  the  training  data  of  the  same  class.  If  this  is  not  the  case,  either  the  net  was 
not  allowed  to  train  to  a  sufficiently  low  error  value,  or  the  test  vector  presented  was  not 
representative  of  the  claiss  of  data  on  which  the  net  was  trained. 

The  face  recognition  neural  network  used  here  is  a  three-layer  (input,  output,  and 
one  hidden)  network  from  the  class  of  multi-layer  perceptrons  as  shown  in  Figure  3.5.  The 
inputs  to  the  net  are  the  KLT  coefficients  calculated  from  the  original  face  images,  and 
the  outputs  represent  the  identities  of  the  persons  to  be  recognized.  The  learning  rule  for 
weight  updates  on  this  particular  net  is  based  on  the  common  backward  error  propagation 
(or  back-prop)  algorithm,  with  which  the  output  error  is  used  to  propagate  a  correction 
to  the  weights  back  through  the  net  .  The  hidden  and  output  layer  nodes  all  use  sigmoid 
functions  as  their  activations,  providing  non-linear  separation  capability  as  well  as  an  easily 
implemented  update  rule.  Given  some  error  measure  E  as  the  output  of  the  net,  the  generic 
gradient  descent  weight  update  rule  is 


to 


+ 


=  tv~  — 


dE 


and  the  rule  we  shall  use  can  be  stated  simply  as 


wt  =  wji  -I-  rjSjYi 

where  Wji  represents  the  weight  connecting  node  j  to  node  i  in  the  previous  layer,  Sj  is 
an  error  factor  associated  with  the  outputs  of  node  j,  V)  is  the  output  of  node  t,  and  r) 
is  generally  a  variable  learning  rate,  wfi  represents  the  updated  weight  value,  and  wj) 
represents  the  weight  value  before  the  update.  The  6s  are  easily  calculated  for  a  net  with 
k  output  nodes,  j  hidden  layer  nodes,  and  t  input  nodes: 

Define  the  error  output  as 

k  ^ 

where  Dt  is  the  desired  output  from  node  k  and  Z*  is  the  actual  output  from  node  k.  We 
wish  to  minimize  this  error  function,  and  thus  will  use  the  derivative  of  the  function  to 
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perform  a  gradient  descent  search  for  the  update. 


dE 

dwkj 


dwk 


d 


iDk-Zk)—{Dk-Zk) 

awki 


Notice  the  derivative  is  being  taken  with  respect  to  a  specific  weight  connecting  a 
particular  hidden  node  j  with  a  particular  output  node  k,  so  the  summation  over  k  reduced 
to  only  a  single  value.  Let’s  now  rewrite  the  output  Z,  recalling  that  the  activations  for 
our  hidden  and  output  nodes  are  sigmoids; 


Zk  = 


where  Yj  is  the  output  of  hidden  node  j.  Then  we  can  say 


After  some  slight  mathematical  manipulation,  we  can  rewrite  the  derivative  as 


J-{Dk-Zk)  =  -Zk{l-Zk)Yi 

dwkj 


Therefore 


—  =  -{Dk  -  Zk)Zk{l  -  Zk)Yj 
dWkj 
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K  we  now  define  =  (D*  -  Zt)Zt(l  -  Zt),  then  we  can  state  the  update  rule  for  the 
weights  between  hidden  layer  and  output  as 

=  ^kj  -  Mouthy j 


We  can  derive  the  update  rule  for  the  weights  between  the  input  and  hidden  layers 
in  a  similar  manner: 


dE 

dwji 


s!- E  5  (O.  -  2*)’ =  -  Z*)  ^  ( A  -  2*) 


dwji 


In  this  case,  the  summation  over  k  is  retained,  as  the  derivative  is  being  taken  with  respect 
to  a  weight  connecting  nodes  i  and  j. 


-^.(1  - 


Yj{l-Yj)X, 


So 

^  -  5]  (K,  -  Z*)  (ZO  (1  -  Z,)  iw,i)  {Y, )  (1  -  i; )  X, 

We  can  now  denote  an  error  factor  Sj  in  terms  of  the  output  delta  calculated  previously, 
h-  6j  =  h  (^)  (1  “  Yj).  This  allows  us  to  state  our  update  rule  for  the  weights 
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between  the  input  and  hidden  layer  as 

k 

=  w~i  -  SjXi 

Note  that  we  may  also  have  implemented  our  neural  net  with  linear  outputs  while 
retaining  the  sigmoidal  hidden  nodes.  For  that  case,  the  update  rule  for  the  weights 
between  the  input  and  hidden  layers  will  remain  the  same,  but  the  rule  will  change  for 
updating  the  weights  between  the  hidden  and  output  layers.  The  error  function  will  stiU 
be 

k  ^ 

but  the  output  Z  at  each  node  will  now  be  represented  by 

j 

leading  to  the  derivative  with  respect  to 

With  the  new  definition  bt  =  -(/?*-  Z*),  our  update  rule  for  the  net  with  linear  outputs 
is  still 

=  wii  “  VouthY} 

Figures  3.6  and  3.7  illustrate  the  function  of  the  complete  face  recognition  system 
used  as  a  building  block  for  our  identity  verification  implementation.  Figure  3.6  shows  the 
process  of  determining  the  appropriate  lower-dimensionality  feature  space  for  the  given 
training  population,  and  Figure  3.7  demonstrates  the  actual  recognition  process.  The 
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faces  are  first  manually  centered  in  the  input  images^,  then  an  eigen-analysis  is  performed 
on  the  training  population  to  determine  an  onhogonal  feature  space  into  which  to  project 
the  images.  The  dimensions  to  be  retained  become  the  new  basis  set,  and  are  known 
as  eigenfaces,  because  they  represent  the  eigenvectors  of  the  original  training  population. 
Figure  3.8  shows  examples  of  eigenfaces  extracted  from  a  training  set  consisting  of  six 
people.  Using  this  basis  set,  coefficients  are  extracted  from  each  image  and  presented  to 
the  neural  net  for  training.  Each  node  of  the  output  represents  one  person  to  be  recognized, 
and  once  the  net  is  trained,  presentation  of  a  new  instance  of  one  of  the  faces  the  system 
was  trained  on  should  result  in  the  ‘firing’  of  the  appropriate  node. 


^Recall  that  a  collateral  thesis  by  McCrae  is  solving  the  automatic  segmentation  problem  (36). 
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Figure  3.6  Determination  of  the  orthogonal  feature  space 
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Figure  3.7  The  face  recognition  system  used  in  this  research. 
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Figure  3.8  Four  example  eigenfaces  produced  by  a  training  set  consisting  of  six  examples 
each  of  six  different  people. 

3.2.2  Speaker  Identifier.  The  base  speaker  identifier  used  for  this  verification 
system  was  developed  at  AFIT  by  Colombi  in  a  1992  thesis  effort,  and  is  based  on  the 
extraction  of  cepstral  coefficients  from  individual  utterances  (9)^.  These  coefficients  are 
used  to  build  a  codebook  containing  a  set  of  code  ’>  ctors  (coefficient  vectors)  representative 
of  the  vocal  frequency  range  of  the  person  to  be  identified.  When  that  person  presents 
new  utterances  to  the  system  for  recognition,  cepstral  coefficients  are  extracted  from  the 
new  speech  and  a  distortion  metric  is  used  to  measure  the  distance  from  the  new  vectors 
and  those  in  the  codebook.  An  overview  of  the  system  is  provided  in  the  following  section. 

3. 2.2.1  Training  the  System.  Figure  3.9  illustrates  the  procedure  followed 
to  train  the  system  for  a  particular  individual.  Using  the  audio  input  capabilities  of  the 
Sun  SPARCstation,  frames  of  approximately  30  milliseconds  of  speech  (overlapped  by 
50%,  as  illustrated  in  Figure  3.10)  are  digitally  sampled  at  8000  samples /second,  and  then 

^Farni  provides  an  excellent  discussion  of  the  theory  and  application  of  Cepstral  coefficient- based  speech 
processing  in  (15). 


3-18 


3-19 


Amplitude 


Frame  2 


Frame  4 

Figure  3.10  Dlustratiou  of  the  overlap  of  speech  frames  (each  frame  'vSO  milliseconds 
long). 


Twenty  Cepstral  coefficients  are  extracted  from  each  of  the  resultant  frames  by  the 
Entropic  Signal  Processing  System  (ESPS®)  software  package,  and  then  ESPS  is  used  to 
determine  a  probability  of  voicing  factor  for  each  of  these  frames.  A  probability  of  voicing 
at  or  above  some  threshold  indicates  the  presence  of  formants,  which  provide  a  better 
correlation  with  an  individual’s  vocal  tract  than  do  fricatives,  and  using  such  a  threshold 
will  allow  us  to  discard  the  frames  containing  only  fricatives  or  silence.  All  of  the  frames 
that  are  retained  are  then  presented  to  a  Linde-Buzo-Gray  clustering  algorithm,  which 
is  used  to  transform  the  initial  coefficient  vectors  into  a  set  of  64  new  cluster  centers,  or 
codewords,  which  are  placed  in  a  codebook  representing  the  individual  (30). 

S.2.2.2  Speaker  Recognition.  The  first  stages  of  the  recognition  process  are 
identical  to  those  of  the  training  process.  The  individual  to  be  recognized  will  provide 
an  utterance  to  the  system,  and  the  speech  will  be  put  into  frames,  digitized,  Cepstral 
processed,  and  the  frames  with  a  sufficient  probability  of  voicing  will  be  retained.  Each 
of  these  new  vectors  will  be  presented  to  each  of  the  codebooks  representing  the  training 
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population,  and  distortion  measures  will  be  made  of  the  distances  from  the  new  speech  to 
each  of  the  codebooks.  The  distortion  measure  from  each  codebook  is  calculated  by 


N  U 

distortion  = 

i=i i=i 

where  N  is  the  number  of  new  speech  vectors,  M  is  the  number  of  codewords  in  the 
codebook,  and  mini^u  [d(Xj,  Vj)]  is  the  minimum  of  the  Euclidean  distances  between  the 
new  vectors  Xj  and  the  codebook  vectors  Vi.  The  new  speaker  is  then  recognized  as  the 
individual  whose  codebook  corresponds  to  the  lowest  distortion.  Xu  showed  that  pseudo 
post-probabilities  could  be  calculated  from  the  results  of  the  distortion  measurement  pro¬ 
cess  through  the  simple  metric 


Pk{i) 


1 

di$tortioni,{i) 

Em  _ 1 _ 

•=1  4i<tort>onk(i) 


where  t  is  the  class  under  consideration,  k  is  the  classifier  (in  this  case,  one  of  the  code¬ 
books),  and  M  is  the  total  number  of  classes  (66). 


S.S  Conversion  From  Identification  to  Verification 

Verification  is  generally  an  easier  problem  to  solve  than  identification,  but  we  must 
recognize  that  the  two  processes  are  distinct,  and  can  be  approached  differently. 

For  identification,  we  generally  begin  with  a  database  containing  the  identities  of 
known  individuals  smd  information  that  can  be  used  to  classify  new  instances  of  these 
individuals;  the  identity  of  a  new  instance  will  be  selected  from  the  database  based  on 
some  set  of  characteristics  and  some  selection  rule.  The  assumption  made  is  that  the 
person  to  be  identified  is  present  in  the  database  already,  or  put  another  way,  that  the 
database  contains  the  entire  population  of  possible  targets.  To  overcome  this  limitation,  it 
is  possible  to  set  an  acceptance  threshold  so  that  an  identification  will  be  made  only  if  some 
minimum  error  criteria  is  met;  it  would  be  perfectly  valid  to  perform  a  statistical  analysis 
of  the  performance  of  the  system  for  each  individual  in  the  database  to  determine  such  a 
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threshold  This  can  become  quite  cumbersome,  however,  as  the  number  of  individuals 
and  the  amount  of  information  in  the  database  increases. 

In  the  verification  problem,  the  assumption  is  again  inherent  that  the  database  con¬ 
tains  information  about  every  possible  target,  but  we  now  have  a  different  classification 
criteria:  we  may  either  decide  that  the  target  is  who  he  claims  to  be,  or  is  not,  and  need 
not  concern  ourselves  with  who  he  actually  is.  As  with  identification,  we  could  again 
statistically  determine  thresholds  to  establish  confidence  in  the  performance  of  our  veri¬ 
fication  system,  but  that  would  still  be  a  computationally  intensive  and  time  consuming 
task.  We  would  prefer  to  find  a  way  to  actually  represent  the  possible  target  population 
as  a  whole,  and  to  train  our  system  to  determine  what  makes  a  new  instance  of  a  known 
target  different  from  another  member  of  that  target  population. 

The  approach  made  in  this  research  is  to  attempt  to  do  just  that:  model  the  aver¬ 
age  population  and  train  our  verification  system  to  recognize  what  makes  an  individual 
stand  out  from  that  model.  We  will  use  the  same  back-prop  neural  net  as  was  used  for 
face  recognition  for  this  effort,  but  instead  of  training  it  to  differentiate  one  individual 
from  another,  we  will  train  it  to  differentiate  each  individual  from  the  entire  set  of  other 
individuals;  one  could  say  that  we  are  treating  all  the  non-target  individuals  as  different 
instances  of  a  single  person.  The  hope  is  that,  ^ven  enough  individuals  in  the  data  base, 
we  will  be  able  to  successfully  model  the  ‘averse  world’  person,  and  thus  discriminate  the 
target  from  this  person. 

S.S.l  Face  Verifier.  To  provide  a  face  verification  capability,  we  will  break  down 
our  population  of  training  images  into  two  classes:  the  person  to  be  verified  and  everyone 
else.  If  we  begin  with  a  training  set  consisting  of  k  prototypes  each  of  N  individuals,  then 
one  training  class  will  consist  of  k  images,  and  the  other  of  {N  -  l)k  images.  As  with  the 
basis  system,  the  training  population  will  be  used  to  form  an  orthogonal  space,  and  KLT 
coefficients  for  each  image  will  be  extracted.  These  coefficients  will  be  presented  to  the 

*We  may  envision  the  analysis  as  requiring  ‘many’  new  instances  of  each  individual  to  be  presented  to 
the  system,  and  recording  the  number  of  correct  and  false  recognitions.  A  threshold  could  be  established 
to  minimize  the  error  for  each  individual  based  on  this  analysis. 
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neural  net,  along  with  a  tag  identifying  each  set  as  either  belonging  to  the  target  individual 
or  not,  and  the  net  will  be  trained  to  provide  proper  classification. 

After  the  net  is  trained  for  a  specific  individual,  the  weights  calculated  will  be  stored 
in  a  file  specific  to  him,  and  will  be  used  to  rebuild  the  net  whenever  an  individual  presents 
himself  to  the  system  claiming  to  be  that  person.  Ruck,  et  al,  has  shown  that  the  outputs  of 
a  multi-layer  perceptron  configured  as  ours  will  approximate  the  a  posteriori  probabilities 
of  being  in  a  specified  class  (50).  For  the  recognition  problem,  where  equal  numbers  of 
protoypes  for  each  class  are  presented  for  training,  the  probability  of  being  in  a  certain 
class  is  simply 


where  Zm  is  the  output  of  a  specific  node  in  the  output  layer,  and  K  is  the  number  of  nodes 
in  that  layer.  A  source  of  bias  exists  in  the  verification  case  in  that  there  will  be  many 
more  instances  of  ‘not-Joe‘  than  there  wiU  be  of  ‘Joe’  presented  to  the  net  for  training  (in 
fact  there  will  be  iV  -  1  times  more  instances).  This  bias  must  be  accounted  for  at  the 
net  output  if  we  wish  to  classify  in  terms  of  post-probabilities,  which  will  be  quite  useful 
when  attempting  to  fuse  our  two  basis  verification  systems.  Hush  and  Horne  have  shown 
a  simple  technique  for  compensating  for  such  a  bias. 

If  the  training  set  distribution  does  not  accurately  reflect  the  actual  a  priori 
probabilities,  the  network  outputs  can  be  scaled  to  compensate  .  .  .  The  proper 
adjustment  can  be  made  by  scaling  the  estimate  of  P(u;,|x)  by  Pt{u)i), 

where  P(u;,  )  is  the  true  a  priori  probability,  and  Pt{ui)  is  the  a  priori  probability 
implied  by  the  training  set  distribution  (24). 

In  our  case,  we  will  scale  the  probability  of  a  correct  verification  calculated  at  the  output 
of  the  neural  net  by 

0.5  _  0.5  ♦  number  of  non  target  vectors 

(number  of  target  vectors)/(number  of  non  target  vectors)  number  of  target  vectors 

3.S.2  Speaker  Verifier.  The  original  speaker  identifier  classifies  by  selecting  the 
codebook  resulting  in  the  lowest  distortion  measure  between  that  codebook  and  a  set  of 
test  vectors.  Because  we  wish  to  project  the  classification  information  into  a  probabilistic 
space,  we  will  s^ain  use  a  neural  net  classifier.  We  will  retain  the  earlier  stages  of  the 
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system,  extracting  Cepstral  coefficients  and  developing  a  codebook  for  each  person,  but 
will  then  present  the  vectors  within  the  codebook  to  a  neural  net,  tagging  the  vectors  in  the 
target  codebook  as  belonging  to  one  class,  and  all  other  codebooks’  vectors  as  belonging 
to  another.  Similarly  to  what  we  did  with  the  face  verifier,  we  wish  to,  in  effect,  build  a 
composite  ‘not-Joe’  to  train  against  the  person  to  be  verified.  ’’ 

The  vectors  within  all  the  training  codebooks  wiU  again  be  used  to  form  an  orthog¬ 
onal  space  into  which  to  project  each  of  the  targets,  and  the  net  will  be  trained  on  the 
extracted  KLT  coefficients  exactly  as  with  the  face  verifier.  When  new  instances  of  a  per¬ 
son  are  presented,  the  speech  will  be  converted  into  a  new  test  codebook,  and  verifying  the 
speaker’s  identity  will  be  accomplished  in  the  same  manner  as  verifying  faces,  including 
the  compensation  for  training  bias  produced  by  training  on  more  ‘non-Joe’  vectors  than 
‘Joe’  vectors. 

5.4  Fusion  of  Face  and  Speaker  Verifiers 

Though  many  options  exist  for  fusing  multiple  sensor  outputs,  for  this  implementa¬ 
tion  we  will  only  look  at  simple  linear  combinations  of  the  post  probabilities  output  from 
our  two  neural  nets.  In  other  words,  we  will  attempt  to  minimize  the  classification  error 
by  choosing  the  appropriate  combination  of  posteriori  probabilities. 

5.5  Potential  Improvements  to  the  Face  Verifier 

The  existing  face  identification  system  relies  on  the  KLT  process  to  reduce  the  di¬ 
mensionality  of  the  feature  set  and  to  project  the  images  into  an  orthogonal  feature  space. 
The  number  of  features  are  further  reduced  by  selecting  the  desired  number  of  eigenvec¬ 
tors  corresponding  to  the  highest  eigenvalues  of  the  training  set  covariance  matrix.  With 
faces,  previous  work  demonstrated  that  the  top  five  or  six  eigen-coefficients  were  adequate 
for  providing  good  classification  performance  under  fairly  controlled  conditions,  but  no 
attempt  was  made  to  determine  if  we  were  actually  selecting  the  dimensions  with  the  most 

^Note  that  we  could  train  our  net  using  the  Cepstral  vectors  prior  to  clustering  into  codebooks,  but  the 
clustering  process  was  designed  to  capture  the  essence  of  the  speaker;  any  small  loss  of  information  is  felt 
to  be  more  than  offset  by  the  elimination  of  the  overhead  of  processing  hundreds  of  ‘raw’  vectors  (versus 
t  i  for  the  codebook)  for  each  individual. 
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classification  information.  Recall  that  the  KLT  process  chooses  as  the  optimal  dimensions 
those  that  have  the  maximum  variance  across  the  entire  training  population.  Though  the 
higher  variance  does  imply  a  higher  amount  of  information,  the  information  is  not  neces¬ 
sarily  mc»t  suitable  for  differentiation  between  classes.  Instead,  this  type  of  information 
is  more  suited  to  reconstruction  of  some  initial  pattern  using  a  reduced  number  of  fea¬ 
tures;  reconstructing  with  a  given  number  of  eigen-coefficients  (in  descending  order  from 
the  maximum  eigenvalue)  guarantees  that  there  is  no  other  combination  of  coefficients  that 
will  result  in  a  lower  mean-squared  reconstruction  error  (63:275). 

For  the  classification  problem,  we  wish  to  determine  if  there  is  a  method  of  dimension 
selection  that  is  more  appropriate  to  the  multi-class  problem.  Forming  an  orthogonal  space 
based  on  the  training  population  is  still  appropriate,  so  that  process  will  be  retained,  as 
will  the  projection  of  the  training  patterns  into  that  space.  As  an  alternative  to  the  KLT 
selection  method,  we  would  like  to  select  the  dimensions  that  correspond  to  the  lowest 
amount  of  in-class  variance,  as  weU  as  the  highest  amount  of  out-of-class  variance.  This 
metric  should  result  in  retaining  the  dimensions  that  have  the  most  information  about 
the  difference  between  different  classes.  Such  a  metric  is  the  F  ratio,  which  is  defined  as 

»  Imtance.  of  m  different  clanses  retulU  in 

the  F  ratio 

fn(n-l)  I^J=1  ~  Mi  y 

where  ji  is  the  mean  of  all  measurements  across  all  classes,  fij  is  the  mean  of  all  mea¬ 
surements  for  Class  j,  and  z,-;  is  the  tth  measurement  of  Class  j.  We  will  implement  this 
figure  of  merit  in  both  of  the  verification  systems  we  develop  and  test  the  performance  of 
the  systems  using  dimensions  selected  via  this  metric  versus  those  selected  by  eigenvalue 
order. 

Recognizing  that  we  are  not  restricted  to  selecting  the  ‘important’  features  by  linear 
means  only  (both  eigenvalue  ordering  and  the  F  ratio  provide  linear  selection  criteria), 
we  can  also  attempt  to  non-linearly  transform  our  set  of  orthogonal  dimensions  into  a 
new,  reduced  space.  Using  a  sigmoidal  back-prop  net  similar  to  the  one  we  are  using 
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for  verification,  we  will  explore  the  possibility  of  increasing  classification  performance  by 
training  the  net  to  maximize  the  distance  between  the  classes. 

In  order  to  use  a  neural  net  for  such  a  dimensional  transformation,  we  must  derive 
a  rule  for  updating  the  weights.  Many  update  methodologies  exist,  but  we  will  develop 
one  that  will  transform  the  initial  multi-class  data  (two-class  in  the  verification  case)  into 
a  more  easily  separable,  and  thus  more  easily  classifiable,  space.  The  rule  shall  be  based 
on  maximizing  the  separation  between  elements  of  the  different  classes,  and  the  error  from 
which  it  is  derived  can  be  stated  as 

k  ^ 

where  Zj[  is  the  output  from  output  node  k  when  the  net  is  presented  a  vector  from  Class 
/,  and  £  is  a  measure  of  the  total  distance  between  the  net  outputs  when  presented  with 
vectors  first  from  Class  1,  then  from  Class  2.  Maximizing  this  function  should  maximize 
the  distance  between  the  members  of  different  classes.  We  can  use  gradient  ascent  to 
accomplish  our  goals  because  we  are  maximizing.  To  derive  our  update  rule,  we  shall 
begin  by  finding  the  partial  derivatives  of  the  output  ‘error’  with  respect  to  the  various 
weights.  For  the  weights  connecting  node  j  in  the  hidden  layer  to  node  k  in  the  output 
layer 

=  {Zl  -  Z^)  [zlil  -  Zl)Y^  -  Zlil  -  zI)y;] 

Then  the  update  rule  caji  be  stated: 

=  ^kj  +  Vout  {^lYl  -  6lY?) 
where  Si  =  (Zj  -  Zl)Zi{l  -  Z'),l  =  1,2 
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The  update  rule  for  the  weights  connecting  the  tth  input  node  to  the  jth  hidden 
node  can  be  found  in  a  similar  manner: 


=  E  [(z;  -  zl)  (-(Zi)(i  - 

=  E  [(zi  -  22)  [(zi)(i  -  z2)('».,)(i)‘)(i  -  yi)xl  -  (Z2)(i  -  z2)(«.y)(y/)(i  -  Y^)xt\\ 

k 

=  E  I(Z2  -  z2)(»«)l  [(Z2)(i  -  zi)(y;)(i  -  y,')x;  -  (zfxi  -  zj)(y?)(i  -  y?)x?] 

k 

This  then  leads  to  the  weight  update  rule 

Li  k  . 

where  6\  =  (ZJ  -  ZJ)(e..j)(Zi)(l  -  Z;)(y'')(l  -  Yj) 

Note  that  the  derivation  performed  above  assumes  only  two  classes  are  being  exam¬ 
ined;  in  the  case  of  our  verification  problems,  this  would  be  an  appropriate  solution.  The 
error  term  can  easily  be  generalized  to  m  classes: 

^^  =  EEE5(zr-zr)’ 

k  m  th 

where  the  summations  over  m  and  m  will  account  for  all  the  distances  between  the  outputs 
of  each  member  of  one  class  and  each  member  of  every  other  class.  The  weight  update 
rules  can  be  derived  in  a  manner  similar  to  that  shown  above  for  two  classes. 


This  chapter  has  provided  an  overview  of  the  methodology  to  be  used  in  the  performance 
of  this  research.  The  details  of  the  actual  implementation  and  the  software  generated  and 
modified  are  presented  in  Appendix  A.  The  next  chapter  provides  the  results  from  the  ex¬ 
perimentation  accomplished  in  the  course  of  this  research.  Performance  of  the  individual 
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verifiers  under  different  conditions  will  be  detsuled,  as  will  the  performance  of  the  overall 
identity  verification  system. 
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IV.  Results 


4.1  Introduction 

This  chapter  contains  the  resiilts  obtained  during  this  thesis  research.  Because  a 
major  thrust  of  this  work  has  been  to  determine  if  the  dimensional  reduction  achieved 
through  the  traditional  KLT  methodology  is  well  suited  to  face  recognition/verification, 
we  shall  first  examine  the  results  of  applying  the  Figure  of  Merit  (FoM)  described  in 
Chapter  3  to  a  contrived  ‘toy’  problem.  In  this  way,  we  will  determine  the  effectiveness  of 
the  metric  on  a  problem  with  a  known  data  distribution,  and  if  it  is  shown  to  be  effective 
we  will  extend  its  use  to  actual  face  and  speaker  data. 

We  will  then  examine  the  face  verification  problem,  focusing  again  on  the  dimension¬ 
ality  reduction  task,  comparing  clustering  and  accuracy  obtained  by  selecting  dimensions 
based  on  FoM,  eigenvalue,  and  the  dimensions  resulting  in  the  lowest  training  error  for  a 
set  number  of  training  epochs.  We  will  perform  the  same  comparison  using  speaker  data, 
and  will  use  the  results  from  the  face  and  speaker  methodology  comparisons  to  implement 
the  two  components  of  the  identity  verification  system  using  the  technique  best  suited  to 
each. 

Next,  we  will  examine  the  performance  of  the  nonlinear  dimensional  transformation 
method  on  toy  and  actual  data.  In  this  examination,  we  will  attempt  to  determine  the 
viability  of  such  a  metric. 

We  will  finally  look  at  the  verification  accuracy  resulting  from  the  fusion  of  the  face 
and  speaker  verification  systems.  We  will  first  determine  the  optimal  linear  combination 
of  individual  verifier  outputs  that  will  provide  the  highest  training  accuracy,  and  will  then 
use  that  combination  to  test  the  system  against  new  data. 

4.2  The  Figure  of  Merit 

Recall  that  the  Figure  of  Merit  (FoM)  was  developed  through  a  straightforward 
process; 


4-1 


•  Calculate  the  orthogonal  eigenspace  associated  with  the  entire  population  of  training 
samples  by  determineing  the  eigenvectors  of  the  covariance  matrix  of  the  sample 
population. 

•  Project  each  individual  class  into  the  new  eigenspace  using  simple  matrix  multipli¬ 
cation. 

•  Determine  the  means  and  variances  of  each  individual  class  within  the  orthogonal 
space,  as  well  as  the  variance  of  the  class  means  and  the  mean  of  the  class  variances. 

•  Calculate  the  figure  of  merit  using  the  following  formula; 

variance  of  interclass  means 

FoM  =  - - - ; - : - 

mean  of  tntraclass  variances 

This  FoM  represents  a  measure  of  how  separable  the  individual  classes  are  within  the 
orthogonal  space  in  relation  to  how  tightly  clustered  the  elements  of  each  class  are.  A 
higher  value  will  indicate  better  separability. 

To  test  this  FoM,  we  developed  a  toy  problem  using  a  two-dass,  three-dimensional 
environment.  Using  MatLab  ®,  50  sample  points  were  used  to  generate  a  uniform  distri¬ 
bution  with  dimensions  on  the  x  —  y  —  z  axis  of  5  units  by  1  unit  by  1  unit.  A  copy  of  this 
distribution  was  then  made,  and  was  shifted  along  the  y  and  z  axes  so  that  the  two  classes 
were  parallel  along  the  x  axis  (Figure  4.1). 
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Figure  4.1  Toy  problem  distribution. 

This  type  of  class  distribution  will  provide  a  basis  for  a  comparison  between  the 
traditional  KLT  method  and  the  FoM  approach.  Using  the  KLT,  the  importance  of  each 
eigenvector  calculated  will  be  directly  proportional  to  the  variance  of  each  dimension  among 
the  entire  sample  population;  the  dimension  with  the  highest  variance,  in  this  case  along 
the  X  axis,  will  be  the  most  important.  However,  we  can  easily  see  there  is  no  information 
in  the  x-dimension  that  allows  us  to  discrinoinate  between  the  two  classes,  so  using  the 
KLT  and  one  eigen-coefiicient  should  result  in  poor  discriminator  performance.  Using  the 
FoM  approach,  however,  more  importance  is  placed  on  those  dimensions  having  a  higher 
class  mean  separation  in  relation  to  the  class  variances.  In  this  case,  obviously  either  the 
y-  or  x-dimensions  offer  more  information  than  the  x-dlmension  for  class  discriminability. 

The  training  data  were  processed  through  the  two  different  algorithms,  and  a  back- 
prop  neural  net  was  trained  for  each  with  one  eigen-coefficient,  using  half  the  data  points 
from  each  distribution  for  training.  The  classification  results  using  a  single  coefficient  from 
the  remaining  data  points  are  shown  in  Table  4.1.  Note  that,  as  expected,  the  KLT  method 
does  not  allow  discrimination  between  the  classes  using  only  one  coefficient,  while  the  FoM 
method  does  indeed  allow  us  to  classify  the  test  data  with  a  high  probability  (recall  that 
the  outputs  of  the  neural  net  can  be  directly  related  to  pseudo  post-probabilities). 
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Table  4.1  Classification  accuracy  of  KLT  vs  FoM. 


KLT 

FoM 

Mean  Probability  of 
Correct  Classification 
(one  coefficient) 

0.22 

■ 

4-3  Face  Verification  Dimensionality  Reduction 

In  this  section  we  shall  examine  the  characteristics  and  importance  of  the  different 
dimensions  resulting  from  the  orthogonalization  process  on  face  data. 

4.3. 1  Clustering  in  Two  Dimensions.  To  get  a  feeling  for  how  our  data  will 
cluster  using  the  different  dimensionality  reductions  schemes,  we  will  exaonine  six  classes 
of  faces  composed  of  five  prototypes  each.  We  will  project  these  faces  into  an  orthogonal 
eigenspace,  but  will  only  retain  two  of  the  thirty  possible  dimensions  and  plot  all  the 
extracted  data  eigen-coefilcients  onto  a  single  plot.  Retaining  the  two  bases  corresponding 
to  the  dimensions  producing  the  lowest  training  error  resulted  in  the  cluster  plots  of  Figures 
4.2  (clustering  of  the  same  data  on  which  the  net  was  trained)  and  4.3  (clustering  of  test 
data  not  previously  presented  to  the  net)  Similar  plots  are  presented  in  Figures  4.4  and 
4.5  for  coefficients  selected  based  on  eigenvalue  order,  and  Figures  4.6  and  4.7  for  those 
based  on  our  figure  of  merit.  Figure  4.8  provides  an  example  of  an  original  face  in  the 
trsuning  set  and  the  reconstruction  of  that  face  using  two  eigen-dimensions  selected  by 
each  of  our  three  different  schemes. 


‘The  minimum  eiioi  dimensions  were  found  by  timning  the  net  on  one  dimension  at  a  time  until  one 
was  found  producing  the  lowest  error,  repeating  this  process  while  retaining  the  first  dimension,  and  so  on. 


4-4 


Figure  4.2  Clustering  of  six  training  classes  (faces)  using  the  eigen-dimensions  producing 
the  minimum  training  error. 


Figure  4.3  Clustering  of  six  test  classes  (faces)  using  the  eigen-dimensions  producing  the 
minimum  training  error. 
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Figure  4.4  Clustering  of  six  training  classes  (faces)  using  the  eigen-dimensions  corre¬ 
sponding  to  the  two  highest  eigenvalues. 
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Figure  4.5  Clustering  of  six  test  classes  (faces)  using  the  eigen-dimensions  corresponding 
to  the  two  highest  eigenvalues. 
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Figure  4.6  Clustering  of  six  training  classes  (faces)  using  the  eigen-dimensions  corre¬ 
sponding  to  the  two  highest  Figures  of  Merit. 


Figure  4.7  Clustering  of  six  test  classes  (faces)  using  the  eigen-dimensions  corresponding 
to  the  two  highest  Figures  of  Merit. 
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a) 


b) 


0  d) 

Figure  4.8  Original  training  image  (a)  reconstructed  using  two  dimensions  selected  by 
minimum  error  (b),  eigenvalue  (c),  and  figure  of  merit  (d). 
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Table  4.2  Face  Acceptance/ Rejection  Accuracy 


Type 

Num  Dimensions 

Min-error 

Eigenvalue 

FoM 

True  Accept 

2 

100.0 

100.0 

80.0 

True  Reject 

2 

76.0 

77.3 

43.3 

Overall 

2 

80.0 

81.1 

49.4 

4 

100.0 

100.0 

83.3 

True  Reject 

4 

91.3 

85.3 

49.3 

Overall 

4 

92.8 

87.8 

55.0 

b  iSHPi  W«  > 

6 

100.0 

100.0 

76.7 

True  Reject 

6 

94.0 

96.0 

61.3 

Overadl 

6 

95.0 

96.7 

63.9 

8 

100.0 

100.0 

93.3 

True  Reject 

8 

96.7 

96.0 

60.7 

Overall 

8 

97.3 

96.7 

66.1 

Note  that  for  the  first  two  selection  methodolo^es,  the  six  classes  form  quite  distinct 
clusters  for  both  training  and  testing  data,  implying  that  classification  should  be  fairly 
easy.  The  FoM  plots,  however,  show  no  such  obvious  separability.  The  distribution  of  the 
data  apparently  does  not  lend  itself  to  separabihty  using  such  a  process.  The  implication 
is  that  selecting  dimensions  based  on  the  FoM  may  not  provide  the  performance  desired, 
but  because  we  are  only  looking  at  two  dimensions  out  of  the  possible  30,  the  real  test  will 
be  to  compare  classification  accuracies  for  larger  numbers  of  dimensions. 

Remembering  that  one  of  our  basic  goals  is  to  significantly  reduce  the  number  of  di¬ 
mensions  needed  to  accurately  represent  members  of  each  class,  we  will  examine  the  results 
of  training  and  testing  the  net  with  two,  four,  six,  and  eight  eigen-dimensions,  selecting 
the  dimensions  based  on  minimum  training  error  (min-error),  maximum  eigenvalue,  and 
FoM.  The  training  and  test  sets  will  again  consist  of  six  classes,  with  five  instances  of  each 
class  in  each  set.  Table  4.2  gives  the  results  of  these  tests;  the  source  data  for  these  tables 
can  be  found  in  Appendix  A.  Two  tables  are  provided  here  for  each  scheme,  one  giving  the 
accuracies  in  accepting/rejecting  actual  instainces  of  the  claimed  identity  (True  Accept) 
and  rejecting/accepting  false  instances  of  the  claimed  identity  (True  Reject  of  imposters) 
and  one  giving  overall  verification  accuracy.  Figure  4.9  provides  a  visual  summary  of  the 
overall  accuracy  of  the  face  verifier  using  the  different  dimension  reduction  schemes.  Note 
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Figure  4.9  Face  verification  accuracy  using  two,  four,  six,  and  eight  dimensions  selected 
by  minimum  error  order,  eigenvalue  order,  and  figure  of  merit  order. 

that  in  each  of  the  tests  the  FoM  dimension  selection  scheme  provided  a  substantially  lower 
verification  accuracy  than  the  minimum  error  or  eigenvalue  methods.  This  appears  to  sup¬ 
port  the  impression  of  poor  separability  provided  by  the  clustering  plots  of  Figures  4.6  and 
4.7.  Figure  4.9  indicates  that  when  using  six  or  eight  dimensions,  we  will  attain  virtually 
the  same  verification  accuracy  using  dimensions  selected  by  eigenvalue  eis  we  would  if  we 
selected  them  based  on  minimum  error.  Because  the  KLT  process  naturally  produces  di¬ 
mensions  ordered  by  eigenvalue,  we  will  retain  this  method  for  selecting  our  dimensions  for 
face  verification.  Because  six  dimensions  provide  an  adequate  level  of  accuracy  (96.0  %), 
we  will  use  that  number  for  testing  on  the  face  verifier  portion  of  the  identity  verification 
system. 
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Speaker  Verification  Dimensionality  Reduction 

In  this  section  we  shall  perform  a  similar  examination  of  speaker  data  dimensional¬ 
ity  as  we  did  with  face  data,  including  analysing  the  clustering  performance  of  the  data 
with  dimensions  selected  through  our  three  different  criteria.  We  shall  also  examine  the 
verification  performance  of  the  speaker  verifier  using  these  different  dimension  selection 
schemes. 

4‘4-l  Clustering  toith  Two  Dimensions.  Because  our  speaker  verifier  is  based  on 
training  and  testing  on  64  code- vector  long  codebooks,  the  clustering  plots  will  necessarily 
contain  many  more  data  points  than  the  face  data  plots  did;  with  the  six  classes  we  will 
use,  there  will  be  a  total  of  384  points  on  the  plot.  We  would  still  hope,  however,  to 
see  some  distinct  clustering  of  these  points  by  class,  and  Figures  4.10  through  4.15  show 
that  clustering  behavior  of  some  of  the  classes  is  evident,  but  there  is  substantial  overlap 
between  the  classes. 
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Figure  4.10  Clustering  of  six  training  classes  (speakers)  using  the  eigen-dimensions  pro¬ 
ducing  the  minimum  training  error. 


Figure  4.11  Clustering  of  six  test  classes  (speakers)  using  the  eigen-dimensions  producing 
the  minimum  training  error. 
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Figure  4.12 


Figure  4.13 


Clustering  of  six  training  classes  (speakers)  using  the  eigen-dimensions  cor¬ 
responding  to  the  two  highest  eigenvalues. 
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Clustering  of  six  test  classes  (speakers)  using  the  eigen-dimensions  corre¬ 
sponding  to  the  two  highest  eigenvalues. 
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Figure  4.14  Clustering  of  six  training  classes  (speakers)  using  the  eigen- dimensions  cor¬ 
responding  to  the  two  highest  Figures  of  Merit. 
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Figure  4.15  Clustering  of  six  test  classes  (speakers)  using  the  eigen-dimensions  corre¬ 
sponding  to  the  two  highest  Figures  of  Merit. 

There  does  not  appear  to  be  a  significant  difference  between  any  of  the  dimension 
selection  methods,  so  we  next  performed  an  analysis  of  verification  performance  using 
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Table  4.3  Speaker  Acceptance/ Rejection  Accuracy 


Type 

Num  Dimensions 

Min.error 

FoM 

True  Accept 

4 

38.9 

52.8 

58.3 

True  Reject 

4 

58.0 

58.3 

60.1 

Overall 

2 

55.9 

57.7 

59.9 

True  Accept 

6 

77.8 

38.9 

61.1 

True  Reject 

6 

61.5 

54.9 

58.0 

Overall 

4 

63.3 

53.1 

58.3 

True  Accept 

8 

77.8 

58.3 

66.7 

True  Reject 

8 

59.0 

62.9 

52.1 

Overall 

6 

61.1 

62.4 

53.7 

True  Accept 

10 

88.9 

47.2 

72.2 

True  Reject 

10 

69.8 

62.2 

53.1 

Overall 

8 

71.9 

60.5 

55.2 

four,  six,  eight,  and  ten  dimensions  selected  by  the  three  different  schemes,  minimum 
error,  eigenvalue,  and  FoM  based.  Table  4.3  provides  the  results  of  these  tests,  in  a  format 
identical  to  that  used  when  presenting  the  face  verification  results.  As  with  the  face  verifier 
testing,  source  data  for  these  tables  are  provided  in  Appendix  A.  With  speaker  data,  as  with 
the  face  data,  using  eigenvalue  based  dimension  selection  provides  superior  performance 
over  using  the  figure  of  merit,  so  for  the  speaker  verifier  portion  of  our  identity  verifier 
this  method  shall  be  used.  Because  we  wish  to  reduce  the  dimensionality  of  the  problem, 
we  will  select  the  top  ten  dimensions.  Figure  4.16  provides  a  visual  representation  of  the 
verification  accuracy  for  the  different  number  of  dimensions. 
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Figure  4.16  Speaker  verification  accuracy  using  four,  six,  eight,  and  ten  dimensions  se¬ 
lected  by  minimum  error  order,  eigenvalue  order,  and  figure  of  merit  order. 
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\.5  Nonlinear  Feature  Transformation 


For  this  test,  we  used  the  bax:kprop  neural  net  and  weight  update  rule  as  described 
in  Chapter  3.  The  system  was  first  tested  against  a  set  of  linearly  separable,  two-class  toy 
data,  as  shown  in  Figure  4.17.  Rather  than  attempting  to  reduce  the  dimensionality  of  the 
test  space  (from  two  to  one  in  this  case),  we  first  examined  the  case  of  simply  transforming 
the  original  input  dimensions  into  the  same  number  of  output  dimensions.  This  could 
provide  some  idea  of  if  and  how  the  data  would  cluster  in  a  transformed  space;  similar 
cluster  plots  were  shown  earlier  in  the  chapter  for  face  and  speaker  data.  Figure  4.18 
shows  the  result  of  transforming  the  original  data  through  the  net  after  1000  iterations  (50 
epochs). 
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Figure  4.17  Toy  data  set  used  for  testing  against  the  nonlinear  transformation  net. 


Figure  4.18  Transformed  toy  data. 
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Note  that  the  data  clustered  quite  well,  leading  us  to  believe  that  this  methodology 
holds  promise;  the  net  was  able  to  transform  the  data  points  into  a  more  easily  classifi¬ 
able  space.  Actual  face  data  was  next  projected  into  an  eigenspace,  30  coefficients  were 
extracted  from  each  of  two  classes  of  faces  consisting  of  five  protoypes  apiece.  Figures  4.19 
through  4.23  show  the  movement  of  the  data  points  as  the  number  of  training  iterations 
increases. 


Figure  4.19  Transformed  face  data  (100  iterations). 
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There  is  a  definite  movement  of  cluster  centers  toward  the  far  corners  of  the  reduced, 
two-dimensional  space,  showing  that,  as  with  the  toy  data,  distance  maximization  is  taking 
place.  Note,  however,  that  each  of  the  clusters  has  four  members  of  one  class  and  one 
member  of  the  other  class  in  it,  or  an  80%  clustering  accuracy.  The  net  appears  to  be 
finding  a  solution  to  the  transformation  problem  that  may  not  be  optimal;  it  may  be 
getting  caught  in  a  local  minima.  Recall  that  the  neural  net  is  not  guaranteed  to  find  the 
best  solution  (for  that  matter,  it  is  not  guaranteed  to  find  any  solution),  but  may  find  some 
local  solution  that  is  ‘almost’  correct,  as  was  the  case  here.  Nevertheless,  the  technique  still 
appears  to  have  done  a  good  job  of  reducing  30-dimensional  data  to  only  two  dimensions 
while  still  providing  fair  separability.  Using  the  same  data  set,  we  also  trained  the  net 
for  1,000,000  iterations  (100,000  epochs),  and  found  that  one  of  the  abberrant  points  had 
been  pushed  back  across  to  its  own  class  cluster,  leaving  one  cluster  with  members  of  one 
class,  and  the  other  with  all  the  members  of  one  class  and  one  member  &om  the  other,  a 
clustering  accuracy  of  90%.  This  seems  to  show  that  given  enough  time  and/or  different 
initial  parameterization  the  net  may  indeed  be  able  to  better  separate  the  data  in  reduced 
dimensions. 

We  next  looked  at  a  five  class  problem,  using  five  different  classes  of  face  eigen- 
coefficients,  again  with  five  prototypes  of  each  class,  and  transformed  the  data  points  into 
a  two-dimensional  space.  Figure  4.24  shows  the  result  of  5,000  iterations  (200  epochs)  of 
the  data  through  the  net.  The  hope  of  distinct  clustering  does  not  materialize  for  this  data 
set,  though  some  clustering  has  occurred.  There  is  a  great  deal  of  class  overlap,  though, 
so  classification  would  likely  not  be  an  easy  task. 

Further  training  on  this  set  collapsed  the  data  to  only  two  cluster  points  a  very 
small  distance  apart,  making  the  transformed  space  virtually  unclassifiable.  With  further 
‘tweaking’,  the  performance  of  the  transformation  net  on  multi-class  (greater  than  two) 
problems  could  very  likely  be  improved,  but  for  the  purposes  of  this  research  such  tweaking 
was  not  attempted.  In  the  next  chapter,  we  make  recommendations  for  areas  of  future 
research  on  and  testing  of  this  nonlinear  dimensional  reduction  scheme. 
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Figure  4.24  Transformed  Five  Class  Problem. 


4.6  Fusion  of  the  Face  and  Speaker  Verifiers 

One  traditional  scheme  for  sensor  fusion  has  been  to  fuse  the  post-probabilities  from 
the  outputs  of  the  individual  sensors,  and  we  first  examined  this  method  for  our  fused 
face/speaker  verification  system.  Using  the  four  common  classes  between  the  face  and 
speaker  data  described  earlier  in  this  chapter,  testing  was  performed  to  find  the  lineair 
combination  of  net  output  face/speaker  probabilities  that  provided  the  highest  verification 
probability.  As  can  be  easily  seen  by  the  individual  class  accuracies  provided  in  the  tables 
in  Appendix  B,  the  accuracy  of  the  face  verifier  appears  to  overpower  that  of  the  speaker 
verifier.  Recall,  though,  that  the  verification  scheme  has  been  based  on  a  simple  0.5 
threshold;  that  is,  if  the  probability  calculated  from  the  output  of  the  net  is  greater  than 
0.5,  the  verifier  is  said  to  have  accepted  the  individual  as  who  he  claims  lo  be.  Conversely, 
a  post-probability  of  less  than  0.5  will  indicate  non-acceptance  of  the  individual.  We  found 
that  the  levels  of  probability  out  of  the  face  verifier  were  so  high  (for  acceptance)  and  so  low 
(for  rejection)  that  the  face  probability  still  greatly  overpowered  the  speaker  probability, 
the  levels  of  which  did  not  stray  to  the  extremes  nearly  as  often  or  as  significantly.  Based 
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strictly  on  the  probability  levels,  we  would  select  100%  face  verification  and  0%  speaker 
verification  for  the  ‘fused’  verifier 

What  is  more  important  than  simply  the  levels  of  the  probability  output,  however,  is 
the  accuracy  of  verification.  For  this  system,  a  verification  probability  of  0.9  provides  the 
same  correct  (or  incorrect)  answer  as  a  probability  of  0.6.  Therefore,  the  decision  was  made 
to  track  the  verification  accuracy  of  the  system  as  a  function  of  the  linear  combinations  of 
the  outputs  of  the  two  verifiers.  Table  4.4  gives  the  result  of  this  test,  showing  that  the 
highest  fused  verification  accuracy  was  achieved  when  using  40%  face  and  60%  speaker 
probabilities  (recall  that  True  Accept  shows  the  accuracy  in  accepting  actual  instances 
of  the  individual,  and  True  Reject  shows  accuracy  in  rejecting  imposters).  This  may  be 
somewhat  counter-intuitive,  but  is  due  to  both  the  smaller  variation  in  the  levels  of  speaker 
verifier  output  probability  and  the  different  type  of  information  examined  by  the  individual 
verifiers;  a  poor  performance  by  one  verifier  can  be  compensated  for  by  good  performance 
by  the  other. 


Table  4.4  Accuracy  of  fused  test  classes. 


Acceptance/rejection  accuracy 

%  Face 

%  Speaker 

True  Accept 

True  Reject 

Overall 

10 

90 

69 

72.9 

72 

20 

80 

94 

77.1 

81 

30 

70 

100 

87.5 

91 

40 

60 

100 

89.6 

92 

50 

50 

100 

81.3 

86 

60 

40 

100 

83.3 

88 

70 

30 

100 

79.2 

84 

80 

20 

100 

79.2 

84 

90 

10 

100 

81.3 

86 

^This  may  not  be  strictly  true.  We  would  actually  perform  a  statistical  an2Jysis  of  the  data,  and 
determine  if  the  speaker  probabilities  could  narrow  some  confidence  interval  within  which  we  could  expect 
some  restrictive  percentage  of  correct  verification  answers  to  lie. 
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To  test  the  fused  identity  verification  system,  we  selected  nine  individuals  from  whom 
to  collect  both  face  and  speaker  data.  Recall  that  for  the  fused  system  it  was  determined 
that  six-dimensional  face  data  and  ten-dimensional  speaker  data  (both  sets  of  dimensions 
selected  by  descending  eigenvalue  order),  and  a  fusion  ratio  of  40  %  face  and  60  %  speaker 
would  be  used.  Table  4.5  shows  the  accuracy  of  each  separate  verifier  taken  alone  and  the 
overall  fused  accuracy.  The  most  important  thing  to  notice  about  this  table  is  that  the 
performance  of  the  fused  identity  verifier  is  indeed  improved  over  that  of  either  of  the  two 
components. 


Table  4.5  Fused  Identity  Verification  Accuracy. 


Face  Verifier 

Speaker  Verifier 

Identity  Verifier 

55  % 

61  % 

66% 

It  should  also  be  mentioned  that  the  levels  of  face  verification  accuracy  found  during 
this  test  appears  to  be  substantially  lower  than  was  found  during  the  tests  earlier  in  this 
chapter.  This  is  due  to  the  increased  number  of  classes  used  during  this  portion  of  the 
testing;  we  used  a  total  of  50  %  more  prototypes  for  this  test  (nine,  vice  six  for  the  earlier 
tests).  Additionally,  an  effort  was  made  to  purposely  select  individuals  without  radically 
different  appearances,  creating  a  somewhat  more  difficult  classification  problem.  The  goal 
was  to  see  if  sub-optimal  performance  of  the  face  verifier  could  be  overcome  by  input  from 
the  speaker  verifier,  and  the  results  appear  to  indicate  this  is  the  case. 

To  provide  a  ‘sanity  check’  that  the  ratio  of  face  to  speaker  probabilities  selected  was 
appropriate,  we  also  ran  a  test  on  the  nine  classes  of  test  data,  with  the  results  shown  in 
Table  4.6.  Notice  that  if  the  decision  had  been  based  on  this  data,  we  would  have  selected 
the  50  %/50  %  mix,  but  the  40  %/60  %  chosen  also  performed  quite  well. 
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Table  4.6  Accuracy  of  fused  test  classes. 


Acceptance/rejection  accuracy 

%  Face 

%  Speaker 

True  Accept 

True  Reject 

Overall 

10 

90 

53 

63 

62 

20 

80 

67 

62 

63 

30 

70 

72 

63 

64 

40 

60 

83 

64 

66 

50 

50 

97 

64 

68 

60 

40 

100 

61 

65 

70 

30 

100 

59 

64 

80 

20 

100 

54 

59 

90 

10 

100 

52 

57 

In  the  next  chapter  we  will  summarize  the  conclusions  resulting  from  the  testing 
described  in  this  chapter.  We  shall  also  present  a  summary  of  areas  recommended  for 
future  research. 
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V.  Conclusion 


5. 1  Introduction 

This  chapter  will  provide  a  short  summary  of  the  purpose  and  results  of  this  research 
effort,  and  will  suggest  areas  suitable  for  future  research.  Recall  that  our  overall  goals  were 
threefold: 

•  To  convert  existing  face  recognition  and  speaker  identification  systems  to  face  and 
speaker  verification  systems. 

•  To  determine  whether  the  method  of  dimensionality  reduction  currently  used  for  the 
face  recognition  system  was  suitable  for  the  task. 

•  To  fuse  the  face  and  speaker  verifiers  in  the  hope  of  providing  enhanced  performance 
over  either  of  the  individual  verifiers  alone. 

These  items  will  be  addressed  in  the  following  sections. 

5.S  Conversion  From  Recognition  to  Verification 

5.2.1  Face  Verifier.  The  existing  neural- net  based  face  recognition  system  was 
converted  into  a  verifier  by  altering  the  training  methodology.  Instead  of  training  the  net 
to  recognize  all  the  individuals  in  the  training  data  base,  a  separate  net  was  trained  for  each 
person.  The  feature  set  used  consisted  of  the  KLT  coefficients  extracted  from  the  original 
64  X  64  pixel  images.  This  net  was  trained  to  recognize  two  classes:  the  person  for  whom 
the  net  was  trained  and  everyone  else  (Joe  and  not-Joe).  Training  was  accomplished  by 
presenting  prototypes  of  the  individual  as  instances  of  one  class,  and  all  other  prototypes 
in  the  training  database  as  instances  of  the  other  class.  With  such  a  scheme  we  hoped  to 
train  the  net  to  ‘learn’  what  makes  the  individual  different  from  an  average  person,  where 
the  average  person  is  represented  by  all  the  other  people  in  the  training  set. 

This  system  worked  well,  providing  an  overall  verification  accuracy  of  greater  than 
96%  for  a  six  class  problem  using  six  features  for  each  prototype.  The  system  provided 
100%  true  acceptance  accuracy  (correctly  identifying  new  instances  of  the  individual)  and 
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96%  true  rejection  accuracy  (correctly  rejecting  instances  of  people  other  than  the  individ¬ 
ual). 


5.2.2  Speaker  Verifier.  The  original  speaker  identifier  was  based  on  Cepstral 
extraction,  and  classified  using  a  distortion  metric  in  a  20- dimensional  Cepstral  space.  To 
increase  the  commonality  between  the  face  and  speaker  verifiers,  we  chose  to  convert  this 
system  to  a  neural  net  based  verifier,  using  the  extracted  Cepstral  coefficients  as  the  base 
features  for  each  speaker.  As  with  the  face  verifier,  these  features  were  projected  into  KLT 
space  and  KLT  coefficients  were  extracted  for  presentation  to  the  net.  The  training  scheme 
was  identical  to  that  used  for  the  face  verifier,  training  a  net  for  each  individual  in  the 
training  database. 

The  system  showed  substantially  l''wer  accuracy  than  the  face  verifier  when  tested 
on  a  nine  class,  ten-dimensional  problem.  Overall  accuracy  was  60.5%,  derived  from  47% 
true  acceptance  accuracy  and  62%  true  reject  accuracy.  This  decrease  in  accuracy  is  not 
unexpected,  and  is  due,  we  feel,  primarily  to  one  specific  cause.  AH  of  the  data  (face  and 
speaker)  was  obtained  on  a  single  day.  With  face  data,  it  was  a  relatively  straightforward 
process  to  ensure  consistent  positioning  and  scale  of  the  face  within  the  image  using  manual 
segmentation.  Capturing  the  speaker  data,  however,  was  a  much  less  consistent  process, 
and  it  has  been  shown  that  more  samples  of  speech  would  be  necessary  to  form  an  ‘accurate’ 
representation  of  an  individual(9).  Nonetheless,  the  goal  was  not  necessarily  to  provide 
some  specific  level  of  speaker  verification  accuracy,  but  instead  to  develop  a  baseline  with 
which  to  compare  the  performance  of  the  fused  identity  verification  system. 

5.3  Dimensionality  Reduction 

The  method  used  to  select  the  number  of  features  to  present  to  the  neural  net  was 
based  on  KLT  analysis  for  the  base  verification  systems.  We  questioned  whether  selecting 
the  dimensions  based  on  the  eigenvalues  of  the  covariance  matrix  of  the  original  data  was  a 
suitable  metric.  As  a  baseline,  we  trained  nets  for  each  of  the  individuals  and  determined 
the  eigen-dimensions  to  select  which  would  result  in  the  minimum  training  error.  We 
then  trained  each  net  using  specific  numbers  of  eigenvalue  ordered  dimensions  (the  KLT 
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method),  and  also  the  same  number  of  dimensions  ordered  by  a  new  figure  of  merit  based 
on  the  ratio  of  inter-class  means  to  intra-class  variance.  It  was  found  that  the  eigenvalue 
ordered  dimension  selection  performed  nearly  as  well  for  faces  as  if  the  dimensions  had 
been  selected  based  on  minimum  training  error,  and  substantially  better  than  the  figure 
of  merit  method.  With  speakers,  the  eigenvalue  method  worked  somewhat  better  than 
the  figure  of  merit  method,  but  provided  poorer  performance  than  the  baseline,  minimum 
training  error  method.  Therefore  it  was  decided  that  the  fused  verification  system  would 
use  the  KLT  dimensional  selection  method  for  both  the  face  and  speaker  verifiers. 

We  also  explored  a  method  of  dimension  reduction  using  a  nonlinear,  transformation 
neural  net,  hoping  to  determine  a  new,  reduced  set  of  dimensions  that  would  provide 
better  separability  than  does  the  KLT  or  figure  of  merit  methodology.  Using  a  standard, 
backprop  net,  we  first  trained  on  two  classes  of  data,  using  for  weight  update  a  rule  based  on 
‘pushing’  elements  of  differing  classes  apart  in  the  transformed  space.  When  transforming 
two-class,  linearly  separable,  two-dimensional  toy  data  into  a  new  two-dimensional  space, 
the  net  clustered  the  data  points  to  two  points  at  opposite  corners  of  the  new  space, 
the  desired  effect.  When  two  classes  (five  prototypes  each)  of  actual  30-dimensional  face 
data  were  presented  to  the  net,  we  again  saw  the  same,  distinct  two- point  clustering 
take  place,  but  each  cluster  contained  four  data  points  from  one  class  and  one  from  the 
other.  We  also  presented  five  classes  of  face  data  to  the  net,  and  found  that  though  some 
clustering  did  take  place,  there  was  a  great  deal  of  class  overlap  in  the  transformed  space. 
Preliminary  classification  testing  using  these  transformed  dimensions  showed  significantly 
poorer  performance  than  when  eigen-dimensions  based  on  either  KLT  or  our  figure  of  merit 
were  selected.  We  did  not  pursue  this  methodology  any  further. 

5.4  Face/Speaker  Fusion 

As  described  above  the  identity  verification  system  was  based  on  extraction  of  six 
KLT  face  coefficients  and  ten  KLT  speaker  coefficients  presented  to  nets  specifically  trained 
for  each  individual  in  the  training  set.  Using  training  data,  the  fused  system  was  tested  to 
determine  the  optimal  ratio  of  face  net  output  to  speaker  net  output  to  provide  maximum 
verification  accuracy,  and  it  was  found  that  a  ratio  of  40%  face  to  60%  voice  provided 
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the  best  performance.  With  individual  verification  systems  attaining  only  55%  (face)  and 
61%  (speaker)  accuracy,  the  fused  system  was  able  to  attain  66%  overall  accuracy  using 
this  ratio.  This  shows  that  the  fusion  did  indeed  increase  the  performance  of  the  identity 
verification  system  over  either  of  the  base  verifiers. 

5.5  Future  Research 

We  feel  there  are  are  three  primary  areas  that  warrant  further  research.  The  first 
involves  segmentation  of  faces  from  images  and  the  second  exploring  the  effect  of  using 
multiple  days  training  and  test  data  with  the  identity  verifier.  Finally,  the  nonlinear 
transformation  net  has  shown  substantial  promise,  and  we  feel  with  appropriate  ‘tweaking’ 
it  could  serve  a  useful  role  in  the  dimension  selection  arena. 

5.5.1  Face  Segmentation.  We  have  made  the  assumption  in  this  research  that 
methods  can  be  found  to  segment  the  face  from  an  image  and  position,  scale,  and  rotate 
it  into  a  desired,  standard  position.  One  promising  method  of  finding  the  face  has  been 
explored  in  a  collateral  thesis  by  McCrae  and  involves  color  segmentation  (36).  She  has 
trained  a  ba.ckprop  net  to  recognize  face  color  and  has  been  able  to  successfully  extract 
the  face  from  an  ims^e.  Additionally,  she  has  been  able  to  use  the  same  technique  to 
locate  the  eyes  within  the  face.  Once  the  positions  of  the  eyes  are  known,  the  imag'  ,an  be 
scaled,  shifted,  and  rotated  until  the  eye  points  overlay  some  specific  locations  'istent 
for  every  individual. 

Another  method  showing  promise  for  segmentation  uses  three  dimensional,  temporal 
wavelets,  introduced  by  Burns  in  his  Doctoral  dissertation  (6).  These  wavelets  can  be 
used  to  detect  the  movement  of  faces  within  images  over  time,  and  have  the  additional 
advantage  of  removing  stationary  backgrounds  from  the  images.  This  could  make  for  more 
robust  classification/verification  over  systems  requiring  constant  image  backgrounds  (over 
the  entire  training  and  test  sets)  for  good  performance. 

5.5.2  Multiple  Day  Problem.  The  multiple  day  face  classification  problem  has 
not  been  extensively  addressed  in  the  literature,  though  Colombi,  Krepp,  et  al  have  studied 
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it  at  some  length  here  at  AFIT  (9,  27).  The  basic  problem  is  that  though  a  face  recognizer 
may  function  quite  well  when  using  a  single  day’s  data,  when  presented  with  data  from 
other  days  the  performance  may  be  severely  degraded.  The  fusion  of  face  and  speaker  clas¬ 
sification  technologies  may  help  to  solve  this  problem  by  providing  additional  information 
to  the  decision-maJdng  process.  We  have  shown  that  the  fusion  of  face  and  speaker  veri¬ 
fication  can  provide  better  performance  for  a  single  day’s  data  than  either  verifier  alone, 
and  believe  that  this  performance  increase  will  also  be  seen  when  the  data  set  is  extended 
to  include  data  from  some  larger,  but  still  small,  number  of  days. 

To  extend  this  concept,  fusing  other  biometric  classification  systems  with  the  identity 
verifier  developed  for  this  effort  may  provide  even  better  performance.  Work  is  being 
done  in  concurrent  theses  on  fingerprint  identification  and  written  word  recognition,  and 
we  believe  these  techniques  can  be  adapted  to  synergistically  combine  with  our  verifier 
(32,  58). 


5.5.3  Nonlinear  Dimensionality  Transformation.  Though  the  initial  testing  with 
classification  using  the  nonlinear  dimensional  transformation  net  did  not  show  any  in¬ 
creased,  multi-class  classification  capability,  there  were  some  interesting  results  which  lead 
us  to  believe  the  method  holds  promise.  The  clustering  tendencies  exhibited  show  that 
it  may  be  possible  to  push  class  data  points  to  different  points  in  the  transformed  space, 
given  the  proper  weight  update  nile.  We  used  a  very  simple  rule  based  on  pushing  apart 
members  of  opposite  classes;  this  rule  could  be  altered  in  several  different  ways.  It  could 
be  implemented  in  such  a  way  that  members  of  like  classes  could  be  attracted  to  each 
other,  while  members  of  different  classes  would  still  be  repelled.  The  net  could  also  be 
implemented  with  either  of  the  above  rules,  but  in  a  batch  style,  with  all  training  vec¬ 
tors  presented  to  the  net  before  updating  the  weights.  An  alternate  update  rule  could 
be  derived  based  on  a  similar  scheme  to  the  figure  of  merit  we  developed  for  this  thesis, 
maodmizing  the  variance  of  means  between  different  classes  and  minimizing  the  intra-class 
means.  Other  transformation  methodologies  within  the  net  could  also  be  explored,  such 
as  adding  additional  hidden  layers  or  using  linear  activations  on  the  output  nodes  of  the 
net. 
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5.6  Conclusion 


In  the  course  of  this  research,  we  have  found  the  answer  to  providing  an  identity 
verification  capability  to  be  the  fusion  of  face  and  speaker  verification  systems.  Each  of 
these  systems  are  based  on  KLT  dimensionality  orthogonalization  and  reduction,  followed 
by  back  propagation  neural  net  classification.  We  determined  that  using  the  KLT  provided 
superior  performance  to  using  either  a  nonlinear  transformation  net  or  a  figure  of  merit 
based  on  the  F-ratio. 

We  also  found  that  for  verification  purposes,  a  collection  of  training  prototypes  of 
different  individuals  can  be  used  to  represent  a  single  class,  an  ‘average’  person.  Using 
this  concept,  a  neural  net  can  be  trained  to  recognize  the  difference  between  an  individual 
and  this  average  person.  A  net  must  be  trained  for  each  individual  based  on  this  two-class 
idea,  and  when  anyone  presents  themself  claiming  to  be  a  specific  person,  the  net  trmned 
for  that  individual  must  be  used  to  test  the  claim. 
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Appendix  A.  Software  Development 


A.l  Introduction 

In  this  Appendix  we  shall  examine  the  software  modified  or  developed  in  support  of 
this  research.  All  code  was  written  in  the  ANSI  Standard  C  programming  language;  though 
implemented  on  a  Sun  SPARCstation  2,  the  great  majority  of  it  should  be  portable  between 
platforms.  All  the  code  developed  for  this  effort  is  included  at  the  end  of  this  appendix  for 
reference.  Much  of  the  code  is  based  on  software  developed  during  previous  AFIT  thesis 
efforts,  and  such  information  will  be  provided  in  the  headers  of  the  code  listings. 

We  will  look  first  at  the  code  written  for  the  face  verification  portion  of  the  system, 
and  then  at  the  code  written  for  the  speaker  verifier.  We  will  next  examine  some  of  the 
techniques  developed  to  implement  modifications  to  the  individual  verification  systems, 
and  will  finally  look  at  the  code  used  to  tie  together  the  two  separate  verifiers  into  a  single 
identity  verification  system. 

A. 2  Implementing  the  Face  Verifier 

The  face  verification  system  was  designed  to  operate  in  three  distinct  stages: 

1.  Acquisition  of  images  for  training  and/or  testing. 

2.  Training  of  the  multi-layer  perceptron  for  each  user. 

3.  Testing  of  new  instances  of  a  user  claiming  to  be  a  member  of  the  training  database. 

For  the  first  stage,  the  program  acquire  was  developed  to  capture  images  from  a 
video  camera  connected  to  a  VideoPix  frame  grabber  installed  in  the  workstation.  The 
training  of  the  neural  net  is  accomplished  by  train_net,  and  verify Jace_net  permits 
testing  of  the  face  verifier.  Each  of  these  programs  are  described  in  greater  detail  in  the 
following  sections. 

A. 2.1  acquire.  This  program  was  designed  to  allow  easy  acquisition  of  face  images 
for  either  training  or  testing  the  verification  system.  If  beginning  a  new  acquisition  session, 
the  program  will  first  build  training  and/or  testing  sub-directories  in  which  to  place  the 
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captured  images;  when  adding  images  to  an  existing  database,  the  new  captures  will  be 
placed  into  the  previously  created  sub-directories. 

The  software  controls  the  frame-grabbing  capabilities  of  the  VideoPix  video  capture 
board,  which  is  connected  to  the  output  of  a  video  camera,  and  the  user  is  allowed  to 
capture  images  at  an  arbitrary  rate  and  time  (limited  by  the  hardware  to  four  frames  per 
second).  The  captured  image  is  presented  to  the  user  for  examination,  and  if  satisfactory 
is  saved  to  the  appropriate  sub-directory.  This  process  continues  until  the  desired  number 
of  images  have  been  captured. 

A. 2. 2  train.net.  This  code  allows  training  on  each  of  the  target  images  in  the 
image  database  and  produces  weight  files  for  each  target.  It  first  uses  the  entire  population 
of  training  images  to  calculate  the  orthogonal  eigenspace  into  which  to  project  the  faces, 
and  then  saves  the  appropriate  number  of  bases  (where  that  number  is  user-determined)  as 
eigenface  files.  The  eigenfaces  are  then  used  to  extract  KLT  coefficients  from  each  training 
file,  and  these  coefficients  are  written  to  a  single  data  file  for  presentation  to  the  neural 
net. 

Recall  that  because  this  is  a  verification  net,  there  are  only  two  outputs,  verified  or 
not  verified.  For  each  target  on  which  the  net  is  to  be  trained  we  augment  all  the  target’s 
coefficients  with  the  desired  node  outputs  of  0.9  and  0.1,  and  all  other  coefficients  with 
0.1  and  0.9.  The  net  is  then  trained  on  this  data  to  the  desired  levels  of  accuracy  and 
error,  and  the  weights  calculated  for  the  target  are  saved  to  a  separate  file.  The  process  is 
continued  until  a  net  has  been  trained  for  all  targets  represented  in  the  training  database 
and  weight  files  have  been  produced  for  each. 

A. 2.3  verify.face.net.  Performing  verification  with  the  trained  net  is  accom¬ 
plished  using  this  program,  where  a  test  image  may  already  exist  in  a  file  or  may  be 
captured  live  for  presentation  to  the  net.  One  of  the  command  line  arguments  is  the 
‘claimed’  identity  of  the  image  to  be  verified,  and  once  the  program  is  invoked  it  will  build 
a  net  based  on  the  weights  calculated  by  trainjuet.c  for  that  particular  individual.  Af¬ 
ter  the  test  image  is  presented,  the  outputs  of  the  net  are  adjusted  for  bias  as  explained 
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in  Chapter  3,  and  the  pseudo  post-probability  that  the  test  image  is  an  instance  of  the 
claimed  individual  is  calculated. 

A.S.4  mlp.fuse.  This  code  is  the  actual  back-prop  neural  net  used  for  classifica¬ 
tion  by  both  the  training  and  verifying  programs.  A  complete  description  and  source  code 
listing  can  be  found  in  the  1991  thesis  by  Krepp  (27). 

A.S  Implementing  the  Speaker  Verifier 

The  speaker  verifier  is  implemented  quite  similarly  to  the  face  verifier,  a  fact  which 
allowed  substantial  code  re-utilization.  Again,  a  three  stage  process  was  used: 

1.  Acquisition  of  speaker  data  for  training  and/or  testing. 

2.  Training  of  the  multi-layer  perceptron  for  each  user. 

3.  Testing  of  new  instances  of  a  user  claiming  to  be  a  member  of  the  training  database. 

The  major  differences  in  the  two  verifiers  lie  in  the  data  acquisition  phase.  For  capturing 
and  pre-processing  speaker  data,  the  ESPS®  software  package  was  used  extensively,  in¬ 
cluding  the  routines  mu2eps,  filter,  formant,  refcof,  select,  and  vqdes.  A  windowed 
menu  environment,  SIDtool  (Speaker  Identification  Tool)  was  developed  to  implement  a 
series  of  shell  script  files,  which  in  turn  called  the  routines  listed  above.  After  all  the 
speech  was  captured,  train-net  was  again  used  to  train  a  net  for  each  target  invididual, 
and  finally  verify_voice_net  was  used  to  test  the  verification  performance  of  the  system. 

A.S.l  mu2eps.  This  routine  converts  a  standard  Sun  audio  file  captured  captured 
with  the  built-in  SPARCstation  audio  equipment  into  a  sampled  format  usable  by  the  other 
ESPS  routines. 

A.S. 2  filter.  An  FIR  (finite  impulse  response)  pre-emphasis  filter  is  applied  to 
the  audio  file  by  filter. 

A.S.S  formant.  formant  applies  a  probability-of-voicing  label  to  each  frame  of 
speech. 
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A.S.4  refcof,  spectrans.  These  programs  will  generate  20  Cepstral  coefficients  for 
each  frame  of  speech. 

A.S.5  select.  Using  this  routine,  the  frames  containing  voiced  speech  (determined 
by  the  probability-of-voicing  label)  are  segmented  from  the  utterance. 

A.S.6  vqdes.  vqdes  takes  the  raw  Cepstral  data  and  builds  a  codebook  for  each 
individual  consisting  of  the  64  codewords  resulting  from  the  LBG  clustering  process.  These 
codebooks  are  used  to  train  and  test  the  nets. 

A. 3.7  train.net.  This  program  functions  as  it  did  for  training  on  images,  project¬ 
ing  the  speaker  data  into  KLT  space  and  training  on  the  coefficients  for  each  target.  Weight 
files  are  also  produced,  corresponding  to  the  nets  built  and  trained  for  each  individual. 

A.S.8  verify.voice.net.  As  with  verify_face_net,  speaker  data  is  presented  to  the 
appropriate  net  for  verification  and  the  pseudo  post-probability  of  verification  is  computed. 

A. 4  Implementing  Verification  Modifications 

To  compare  the  performance  of  the  verification  nets  using  eigen-bases  selected  by 
different  criteria,  capability  was  built  into  train-net  to  allow  selection  of  the  eigenvectors 
based  on  eigenvalue,  the  figure  of  merit  discussed  in  Chapter  3,  or  simple  user  selection. 
Additionally,  a  method  of  non-linear  transformation  of  the  original  eigen-bases  to  a  new 
reduced  basis  set  was  explored  via  the  program  trainxnet. 

A. 5  Implementing  the  Identity  Verifier 

The  program  verify Jdentity  was  written  to  fuse  the  verification  probabilities  pro¬ 
duced  by  the  programs  verify_face_net  and  verify_voice_net.  It  functions  by  invoking 
the  two  verifiers  in  turn,  and  then  simply  calculating  the  linear  combination  of  probabilities 
that  residted  in  the  best  overall  system  performance  during  training. 
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A. 6  Code 


The  source  code  for  most  of  the  face  and  speaker  verification  programs  described 
above  are  included  here.  As  mentioned,  the  scripts  for  controlling  the  speaker  capture  and 
Cepstral  processing  can  be  found  in  a  collateral  thesis  by  Prescott  (41). 

A. 6.1  acquire.c. 


ProgTvn:  acquire.c 


Description:  This  program  was  written  to  allow  the  capture  of  grayscale 

images  bom  a  video  camera  and  the  VideoPix  image  capturing  hardware 
in  a  Sun  SPARCstation.  It  will  allow  the  creation  of  a  new  database 
of  traioing  images,  or  the  addition  of  new  images  to  an  existing  data¬ 
base.  It  will  similarly  allow  the  creation  of  a  new  database  of  test 
images  or  the  addition  of  new  images  to  an  existing  set.  Provisions 
are  included  to  either  capture  the  images  using  manual  segmentation, 
or  to  capture  images  using  Gay’s  automatic  motion  segmentation  routines. 


Author:  John  G.  Keller 

Date:  1  Sep  93 

*s********************************************v**********************/ 


#iiiclade  <8tdio.h> 
#indnde  <striiig.h> 
#inclade  "vic.lib.h" 
#include  "globals.h" 
#inclade  "jkaacros.h" 


int  i, 
finished, 
done, 
quit, 

num.protos, 
num.train  =  0, 
num.clas8  =  1; 


FILE  *fnograb; 

char  command[80], 
n-name[8], 
nujiame[8], 
filename[20], 
waste[2], 
another[4], 
an8weT[4]; 
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maiii(int  &igc,  char  *argv|]) 

{ 

int  testnum  =  0; 

char  te8tdir[30]  =  show[12]; 


8y8tem(''clear"); 

/»******««  Make  a  Training  Folder  to  Hold  the  Prototypes  «****«*/ 

if  (fop«!ii("./traiii_ijiage8/*.gra")  ==  NULL)  sysiem("akdir  train_i»agos"); 
if  ((argc  ==  2  ||  argc  ==  3)  Lii  8trcmp("add*',  argv[l])  ==  0) 

{ 

open-read(fnograb,  "iiograb_paraa.dat"); 
i8canf(fnograb,  "tdVa",  &num.prota6); 
f8canf(ihograb,  "Xd\n",  &num.clas8); 

{scanf(fnograb,  "Xd",  Xcnum-train); 
fclo8e(fnograb) ; 

} 

else  if  (argc  ==  2  iiii  8trcr:(p("te8t",  argv[l])  0) 

{ 

printf("\n\nSTITAX:  acquire  <add><te8t>.\n\n"); 
exit(O); 

} 

/»**««****  Prompt  User  for  Number  of  Prototypes  if  necessary*******/ 

if  ((argc  2  ||  argc  ==  3)  &&  8trcmp("add",  argv[l])  ==  0) 

{ 

systein(  "clear"  ) ; 

priiitf("\nTou  hare  choaen  to  add  one  or  aore  individualB  to  the  exiating  databiise .  \nFor 
thia  databaae  you  will  need  Xd  prototypea  for  each  aubjectAn",  num-protos); 
numjclass-l-f-; 

} 

else 

{ 

done  =  0; 
while  (!done) 

{ 

printf("\nEnter  the  nunber  of  prototypes  to  be  uaed  for  each  user  <l-64>:  "); 
scanf(  "Xd"  ,&nnm^rotos) ; 

/*gets(waste);*/ 

printf("\n"); 

if  ((num-protos  <  64)  tcic  (num.protos  >  1)) 

{ 

done  =  1; 

} 

else 

printf("\nTou  need  to  do  at  least  1  and  at  nost  64  An"); 

} 
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} 

/»**««*«**«*****«*«  Enter  Users  Until  You’re  Done  *♦*♦♦♦**♦♦♦♦♦*/ 

while  (Ifinished) 

{ 

done  =  0; 
quit  =  0; 

^***«****  Prompt  User  for  User  Name  ♦♦***♦♦/ 

printf("\nEnt«r  the  person's  usemane  <8  letters  or  lesB>:  "); 

scanf("Xs",  u-name); 

get8(  waste); 

printf("\n"); 

i  =  0: 

while  (ujtame[i]  ^  '\0’) 

{ 

if  (8trlen(u-name)  >  8) 

{ 

printf("\nSorry ,  you're  United  to  8  letters. \n"); 
breah; 

} 

if  (i8alpha(a-name[i])) 

i++; 

else 

{ 

printf("\nSorryt  you  can't  enter  any  numerics  into  the  user  nane.\n"); 
break; 

} 

} 

while(!done) 

{ 

piintf("\nThe  name  you  entered  was  :  Xs\n",ujiame); 

piintf( "Please  re-enter  the  name  if  necessary  or  press  return  to  continue.  ") 
gets(na^ame); 

if  (nnjiame[0]  ==  ’\0’)  done  =  1; 
else 

{ 

sticpy(u-name,nu-name); 
nn-name[0]  =  ’\0’; 

} 

} 

/***********  grab  training  images  of  the  user  ****«***««****««/ 

Loopli(num.protos) 

{ 

sprintf(iilename,"XsXd",ujiame,i); 
autograb(filename) ; 
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8printf(8how,  "diaplayXd",  SM.WIDTH); 

8printf(cominuid,  "Xs  Xs.gra",  show,  filename); 

8ystem(command); 

printf("\nl8  the  picture  satisfactory  (y/n)?  "); 

get8(an8wer); 
if  (answeT[0]  ==  ’y’) 

{ 

if  (  i  <  num.protos)  printf("\nOkay ,  noe  for  prototype  nimber  Xd.\n",  i+1); 
8printf(filename,"XsXdXs",aJiame,i,".gra"); 

sprintf (command,  "av  Xs  ./train.iaages",  filename); 

system(command  )  ; 

num_train++; 

} 

else  i  — =  1; 

} 

/********Gnib  test  images  if  the  'test’  argument  was  used***************/ 


if  ((argc  ==  3  8trcmp(aigv[2],  "test")  ==  0)  ||  (aigc  ==  2  fefc  8ticmp(argv[l],  "test")  ==  0)) 

{ 

if  (testnnm  =s=  0) 

{ 

ptintf("\uHow  aany  test  iaages  will  you  want  for  each  user?  "); 

8canf("Xd",  ictestnum); 
get8(  waste); 

if  (fopen("./te8t_iaage8/«")  ==  NULL) 

{ 

8printf(command,  "akdir  ./test.iaages”,  testdir); 

8y8tem(command); 

} 

} 

Loopli(te8tnum) 

{ 

8printf(filename,  "XsXdt "  ,u  Jiame,i) ; 
autogiab(filename); 

8ptintf(command,  "Xs  Xs.gra",  show,  filename); 
system(command); 

printf("\nls  the  picture  satisfactory  (y/n)?  "); 

gets(  answer); 
if  (answer[0]  ==  'y') 

{ 

if  ( i  <  testnum)  printf("\nOkay,  now  for  test  inage  nunber  Xd.\n",  i+1); 
sprintf  (filename , "  XsXdt  Xs  "  ,u  mame  ,i, " .  gr  a"  ) ; 

sprintf( command,  "bt  Xs  .  /test.inages",  filename,  testdii); 
system(command) ; 

} 

else  i  — =  1; 

} 

} 
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priiitf("\nPTOceBB  another  individual?  <y  or  n>;  "); 

get8(another); 

while  (Iquit) 

{ 

if  ((another[0]  ==  ’n’)  ||  (anothei[0]  ==  ’i’)) 

{ 

finished  =  1; 

/**t******Sa.ve  the  Bograb.paiajn.dat  infotmition*************/ 

open-write(fnograb,  "nograb_paraa.dat"); 
fprintf(fnograb,  "Xd\n",  num.piotos); 
fprint{(fnograb,  "Xd\n",  num.clas8); 
fprintf(fnogiab,  "Xd\n",  num.train); 

{close(fnograb); 
quit  =  1; 

} 

else  if  ((anotheifO]  ==  ’y’)  ||  (another[0]  ==  ’T’)) 

{ 

finished  =:  0; 
num_dass++; 
quit  =  1; 

} 

else 

{ 

printf("\n\nHit  y  if  you  want  to  enter  another  uaer.XnHit  n  if  you're  done  entering 
users.  :  "); 

get8(another); 

printf("\n"); 

} 

} 

} 

system("rn  *.8n"); 
systemC'n  •.gra"); 

} 


A.  6. 2  train-net.  c. 


Program;  train-net.c 


Description:  This  program  is  used  to  train  a  system  based  on  KLT  feature 

extraction  and  a  neural  net  classifier.  The  grab  routine  is  first 

called  to  collect  the  training  images.  After  all  images  of  a 

particular  user  have  been  collected,  each  of  the  images  are 

preprocessed  (centered  and  gaussian  windowed).  The  preprocessed  images 

are  then  used  by  Icl.transform  to  create  an  average  face  and  a  user  determined 
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number  of  eigen  faces.  The  coeffcients  module  is  then  called  to  extract  the  Id  coefficients  from  the  training 
images.  These  coefficients  are  stored  in  a  data 

Sle  called  klt.dat  to  be  used  by  the  neural  network  training  algorithm. 

The  neural  network  algorithm  creates  a  weight  Sle  which  will  be  used 
in  the  recognition  phase.  The  outputs  of  this  code  are  1)  the  klt.dat 
Sle,  2)  the  setup  Sle  for  the  network,  and  3)  the  weight  Sle  created 

by  the  network.  All  training  images  are  stored  in  a  folder  called  trairtingJmages  for  possible  use  in 
retraining  the  system  at  a  later  date. 

Author:  Ken  Runyon 

Date;  25  Sep  92 

ModiSed  by:  John  Keller 

Date:  IS  Jul  93 

ModiScation  Description: 

-  ModiSed  to  allow  multiple  command  line  options. 

-  ModiSed  to  function  with  the  face/speaker  fused  verification  system.  Images  will  be  grabbed  by  the 
program  ’acquire’  beforethis  program  in  invoked.  It  will  read  prototype  and  image  information  from  the 
Sle  nograb-param.dat,  tben  will  pre-process  theimages  if  required.  The  basis  set  will  be  calculated,  and 
thenindividual  weight  Sles  will  be  produced  by  training  the  net  for 

each  user  on  the  list  userJist,  with  a  particular  user’s  Sles  being 
assigned  to  class  1  while  all  the  other  Sles  are  assigned  to  class 
2  and  so  on. 

-  ModiSed  to  perform  same  function  with  speaker  files  as  face  files.  Will  allow  processing  of  individual 
vectors  from  speaker  codebook,  or  concatenation  of  raw  speaker  Cepstral  vectors  to  allow  retention  of 
temporal  relationships. 

-  ModiSed  to  permit  selection  between  three  different  dimension  ordering  schemes;  eigenvalue  (traditional 
KLT),  figure  of  merit,  and  nonlinear  transformation. 

-  ModiSed  to  require  external  source.netup.mlp  Sle  be  kept  in  same  directory  from  which  trainjiet  is 
run.  This  allows  reparameterization  without  recompiling. 

***********************************************************************/ 

#include  <stdio.h> 

#inclade  <string.h> 

#include  "vlc.lib.h" 

#include  "globals.h" 

#uiclade  "jknacros.h" 

/**********DeSne  neural  net  parameters****************/ 

#define  NUMJ.AYRS  2 

#define  WT-SED  1918940490 

#define  PARTIED  1191645590 
#define  RNDM-SED  123456789 

#deline  MAXJTS  1600 

#define  OUTJNT  100 

#define  ETAJN  0.15 

#define  ETA.OUT  0.3 

#define  ETAJ  J  0.0 
#define  ALPHA  0.5 
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#defui«  BAT^Z  1 
#deftne  TRAINJPCT  1.0 
#defuie  NORM  1 


void  voice-net jcoefficientsO,  face.net.coefficient8(); 

float  flnnmber,  fl.waste; 

int  i,  j,  k, 
temp, 
done, 

num.piotos, 

uset-coeis, 

num-coefe, 

nnm.train  =  0, 

nnm-class, 

numberl, 

nnmbei2, 

type, 

FACE, 

SPEAKER, 

NOPROCESS, 

FOM, 

int.waste, 

num.vectors, 

vectoi8.pet.xlass, 

nnmJeatures, 

nnm-chunks, 

chnnk-size, 

num-injchnnk, 

total  Jium-coeis, 

minjinm-vectors, 

nserjcountei  =  1, 

identity, 

count, 

leftover; 

FILE  vfparam, 
vflist, 
vfweights, 

^fset, 

*ftable, 

*&ograb, 

*fdat, 

*fdat2, 

♦fnser, 

^feample, 

^handle, 

vnonlindata; 

char  wt-file[10]. 
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iiug[40], 

dat^e[10], 

a_nune[8], 

nuJDamefS], 

iUenaine[20], 

wa«te[2], 

iuernaine[16], 

iisetname2[16], 

a8erl[20], 

hid-nodes, 

chwaste[40], 

liid^ode82, 

command[30]; 

main(iiit  ugc,  char  *aigv[]) 

{ 

if  (argc  ==  1) 

{ 

printf("\n\nSyntaz:  traiii_net  «ace,  8p«ak*r>  <foa,  •igTalue,noi]Lliii>Cnopz'Oceaa  (only  Talid 
for  fac«)]\n\n"); 
exit(O); 

} 

if  (strcmp("fac«",  argv[l])  ==  0) 

{ 

NOPROCESS  =  0; 

if  (8trcmp(argv[2],  "f o«")  ==  0)  FOM  =  1; 
else  if  (8trcmp(aigv[2],  "eigtralue")  ==  0)  FOM  =  0; 
else  if  (strcmp(argv[2],  "nonlin")  ==  0)  FOM  =  2; 
else 
{ 

printf("\n\nSyntaz:  train_net  <faca,  speak«T>  <fon,  eigTalue,nonli]i>[noproc«se  (only 
ralid  for  fac«)]\n\n"); 
exit(O); 

} 

if  (argc  ==  4  kb  (strcmp(aigv[3],  "noprocess")  ==  0))  NOPROCESS  =  1; 
else  if  (argc  ==  4  kb  (8ticmp(argv[3],  "noprocess")  ^  0)) 

{ 

p  '  itf("\n\nSyntaz:  train_net  <face,  speaker>  <fon,  eigralue,  nonlin> [noprocess  (only 
▼slid  for  face)]\n\n"); 
exit(O); 

} 

type  =  1; 

FACE  =  1; 

SPEAKER  =  0; 

} 

else  if  (8ticmp("speaker",  aigv[l])  ==  0) 

{ 

if  (strcmp(argv[2],  "fon")  ==  0)  FOM  =  1; 

else  if  (strcmp(argv[2],  "eigralue")  ==  0)  FOM  =  0; 

else  if  (stTcmp(atgv[2],  "nonlin")  ==  0)  FOM  =  2; 

else 
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{ 

printf("\n\nSyntax:  train.net  <lac«,  spaaker>  <loa,  eigTaltte,iiaiiliii>[iioproces8  (onlj 
valid  <or  laca)]\n\n"): 
exit(O); 

} 


type  =  0; 

FACE  =  0; 

SPEAKER  =  1; 

opeiuead(fu8er,  "apaaker.liat"); 

} 

else 

{ 

printf("\ii\iiSyntaz:  train_nat  <lace.  speakar>  <loa.  aigvalue,  nonli]i>Cnoproc«BS  (only  valid 
for  faca)]\n\n"); 
exit(O); 

} 

/»*******v*v***I/  Face  processing,  start  here  ♦♦♦*♦*♦**♦♦♦♦*♦♦♦♦♦♦/ 

NOPROCESS  =  1; 

if  (FACE  ==  1  trie  fopen("*.gra",  "r")  #  NULL)  system("n  *.gra"); 
if  (FACE  ==  1) 

{ 

/*vv***««**vGet  the  face  info  from  the  nograb.patam.dat  Sle*************/ 

open^ad(&iogiab,  "nograb_paraa.dat"); 
i8canf(fiiograb,  "Xd\n",  &num-protos); 
i8canf(f]iogiab,  "Xd\n",  Isnumjclass); 
iscanf(fnograb,  "Id",  &nuin-train); 
fclo8e(&ograb); 


together  the  list  of  training  faces*********************/ 

systemC'cp  train_inageB/«.gra  ."); 
systemC'la  v.gra  >  face.list"); 
openjead(flist,  "face.list"); 

if  (NOPROCESS  ==  0)  /*ff  j;r^  processing  will  be  required...*/ 

{ 

/***************Cbeck  that  the  correlation  files  exls.*-  ************/ 

openjcead(handle,  "correlate. ref "); 
open.read(handle,  "wind. ref"); 

/»******************  Pre-process  the  images*********************/ 

Loopli(niim.train) 

{ 

printf("\nIov  pre-processing  inage  •  V  -  .Vr" 
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i8caitf(fliat,  "XaVn"  .filename) ; 
center(SM-WIDTH, "correlate. rel", filename); 
gwin<l(SM.WIDTH, filename); 
center(SM-WIDTH, "wind,  rel", filename); 

} 

fclo6e(fli8t); 

} 

Z************  Decide  how  many  eigenvectors  you  need  *««*•******/ 

done  =  0; 
while  (Idone) 

{ 

printf("\nEnter  the  wmber  of  eigenfaces  on  ehich  you  nant  to  train  <Xd>:  ",num.ttain/3); 
Bcanf(  "Xd"  .Icuaer  jcoefs); 
gets(  waste); 
printf("\B"); 

if  ((oser-coefB  >  0)  Lie  (nser-coeb  <  num-toain)) 

{ 

numjcoefs  =  naerjcoefs; 
done  =  1; 

} 

else  if  (nser^oefs  ==  '\0’) 

{ 

niunjcoefs  =  nnm.tiain/% 
done  =  1; 

} 

else 

piintf("\nTou  need  to  train  on  at  least  1  and  at  most  Xd  eigenfaces\n",numJtrain); 

} 


} 

/*****************!{  speaker  processing,  start  here*********************/ 

else  if  (SPEAKER  ==  1  &&  arge  <  3) 

{ 

system("m  eigenspeakere"); 

printf("\nEnter  the  number  of  eigenspeakers  on  ehich  yon  wish  to  train:  "); 
scanf("Xd",  &num.coe£ai); 
gets(  waste); 

printf("\nEnter  the  number  of  individual  eigenframes  to  concatenate:  "); 
scanf("Xd",  IcnumJnjchunk); 
gets(  waste); 

} 

else  if  (SPEAKER  ==  1  trtr  aigc  >  3) 

{ 

num-coefs  =  atoi(aigT[3]); 
nnmJnxhnnk  =  atoi(argv[4]); 

} 
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/»•**«**•**««  Calculate  the  orthogonal  baaia  act  ******•«••«****«**/ 


if(FACE  ==  1) 

{ 

if  (fopea("«ig«iilac««",  "r")  /  NULL)  sy*tein("r»  aigMifaca*"); 

if(FOM  ss  l)fomJoatine(''lac«_liBt",type,  nutn^coeis,  SM.WIDTH,  nnm.train,  nnm-class,  num.piotoe) 
else  if  (FOM  0)  klt^atine('‘lac«_liat",  type,  namucoefs,  SM.WIDTH,  nam.traiii); 
ebe  if  (FOM  ==  2)  /***Vaing  non  linear  xlormationa**/ 

{ 

klt.rontine("laca_list",  type,  nnmJeatnree,  SM.WIDTH,  nnm.train); 

} 

} 

eke  if  (SPEAKER  ==  1) 

{ 

if  (fopen("aig«nBp«akar«'',  "r")  ^  NULL) 

■yatemC'xn  aiganapaakar*"); 
if  (FOM  ==  1) 

fomjontine("apaakaT_llst'*,  type,  nnm.icoefo); 
eke  if  (FOM  «=  0) 

kit  joatine("apaakar_list'',  type,  nunijcoek); 
eke  if  (FOM  »=  2) 

{ 

int.waste  s  20; 

kIt.xoatine("spaakar_list",  type,  int.waate,  nnnucoek); 

} 

} 

nomjdaaa  =  2;  /****Claaaea  are  either  Joe  or  Not~Joe************/ 

Create  the  lookup  table  tor  the  neural  network  ******/ 

open.write(ftable,  "lookiq>"); 
fprintf(ftaUe,  "X8s\n",  "inTalid"); 
fpiintf(ftable,  "X8s\n",  "Talid"); 
fclo8e(ftable); 


if  (FACE  ==  1) 

{ 

open.iead(fli8t,  "faca.list"); 
open.write(fnser ,  "naar _1  aca_l  iat " ) ; 

} 

eke  if  (SPEAKER  ==  1) 

{ 

openjread(flist,  "apaakar.liat"); 
i8canf(fli8t,  "Xa",  chwaste); 
iacanf(flkt,  "Xd",  &nuin-train); 
fscanf(fli8t,  "Xd",  &vector8.perjcIa8s); 
fscaaf(flkt,  "Xd",  J^namJeatores); 
open-write(faser,  "uaar.apaakar.liat"); 
} 
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8trcpy(a8eniaine2, 

Loopli(nam-ti»ia) 

{ 

bcanf (flist ,  "Xa\n"  .fUenune) ; 
sttcpy(iueriiame,  filename); 
j  =  0; 

wliile(n8eTname[j]  ^  ’\0’) 

{ 

if  (isalplia(a8ername[j])) 

j++; 

else 

asername[j]  =  0; 

} 

if  (8ttcmp(a8einame2,  username)  ^  0) 

{ 

8trcpy(n8emame2,  username); 
fprintf(fnser,  "Xs\n",  username2); 

} 

} 

fcio8e(fn8er); 

fclo8e(flist); 

/**««*****««*  Create  the  d*t*  file  for  the  near*!  network  **«**•*•***«/ 

if  (FACE  *=  1) 

{ 

type  ss  1; 

openjtead(fli8t,  "lacn.list"); 
open-write(fweights,  "klt.f  .dnt"); 
fprintf(fweight8,"Xd\nXd\n",nnmxoef8,nnm.da88); 
system("rn  nigconlfn"); 

Loopli(nnm_ttain) 

{ 

iscanf(fli8t,  "XnXn"  .filename) ; 
if  (FOM  ^  2) 

facejietxoei!icients(SM.WIDTH,  nnmxoeis,  filename,  fweights,  ftable,  numxlass,  FOM,  type); 


For  the  nonlinear  transformation,  first  use  'netucoeSidents  to  extract 
all  the  dgencoefRdenta  from  the  training  set,  then  use  ’xnetpush  ’  to 
transform  these  confidents  into  numxoe&  dimensions  for  each  prototype. 

^^^0S************»*******************************************************/ 


else  if  (FOM  ==  2) 

facejiet_coefficient8(SM.WIDTH,  num.train,  filename,  fweights,  ftable,  numxlass,  FOM,  type); 

} 

fclo8e(fweights) ; 
if  (FOM  ==  2) 

{ 

openjread(fweight8,  "klt_f.dat"); 
open-write(nonlindata,  "lacetent  .dat"); 

fprintf(nonlindata,  "Xd\nXd\nXd\n",  nnm.train,  num.train/num.protos,  num.protos); 
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Loopli(2) 

£Kuf(fweiglito,  "U”,  ^fint. waste); 

Loopli(iium.traui) 

{ 

£Kaaf(fweights,  "Xd”,  XeiBt.waate); 

Looplj(nuin-traiB)  /***Num  of  features***/ 

{ 

bcaaf(fweight8,  "Xf  ">  &fl.waste); 

^rintf(noiiliiidata,  "XI  ",  fl.waste); 

} 

Looplj(2) 

&canf(fweiglits,  "Xl",  &fl-waste); 
fpriBtf(BOiilindata,  "\n"); 

} 

{close(fweights) ; 
fdaae(noiilindata); 

spriBtf(coinmaBd,  "Bake.bpiMt  Xd",  nnmjcoefs); 
system(coininand); 

By8tein("zn«tptiah  lacetaat.dat"); 
opeii^ad(noiiliodata,  "nevtest .  dat "); 
open.write(fweight8,  "klt_l.dat"); 

^ri]it{(fweights,  "Xd\nXd\ii",  numjcoe&,  nunudass); 

Loopli(nam-tiaiii) 

{ 

fpri]itf(fweight8,  "Xd  ",  i  —  1);  /***£xempiar***/ 

Looplj(num^coeb) 

{ 

&canf(nonliitdata,  "Xf ",  &fl-waste); 
fprintf(fweight8,  "Xf  ",  fl.waste); 

} 

fprintf(fweights,  "0.90000  O.SOOOOXn"); 

} 

fdo«e(fweights) ; 
fdo8e(noiilindata); 

} 

/************************************************************************* 
Tbis  is  the  ead  of  the  noniiaear  x/orinatioo  part.  The  kltJ.dat  file 
has  been  rebuilt  with  ’aum^oeft’  number  of  new  features  for  each 
prototype. 

*************************************************************************/ 

} 

else  if  (SPEAKER  ==  1) 

{ 

open-Te^ul(fa8er,  "user.spaaker.liBt"); 
miajinm-vectors  =  1000; 

Loopli(Biun.train) 

{ 

fscaaf(fii8er,  "Xs",  nsemame); 
spriBtf(msg,  "Xa.trainspaech",  username); 
openjead(fiist,  msg); 
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&c«iif(fli8t,  "Xd",  &temp); 
if  (temp  <  minmum-vectors) 
minmum-vectois  =  temp; 
fclose(fli8t); 

} 

rewiiid(fttser); 

Loopli(num.tiaiii) 

{ 

fscanf(fu8et,  "Xs",  uaername); 

8priiitf(m8g,  "Xs.trainspeech",  username); 
open^ead(flist,  msg); 
sprintf(m8g,  "Xn*klt_8.dat",  username); 
open.write(fweights,  msg); 
if  (FOM  #  2) 

voicemetjcoefRcient8(numjreatures,  num^oefo,  fweights,  flist,  num.class,  minmum.vectors); 
else 

voicemetjcoefficient8(numJfeature8,  nnmJeatnres,  fweights,  flist,  nnm^ass,  min-num-vectors); 
fclose(fweight8) ; 

} 

rewind(fuser); 

open.write(fweighta,  "klt.s.dat"); 
fprintf(fweights,“Xd\nXd\n",numxoe£B,nnm.da8s); 

Loopli(nnm-train) 

{ 

&canf(fttser,  "Xs",  username); 

8printf(m8g,  "Xs. klt.s.dat",  username); 
open-iead(handle,  msg); 

Looplj(minmnm.vector8) 

{ 

Looplk(nnm-coe& ) 

{ 

fBcanf(handle,  "Xf ",  ^Anumber); 
fprintf(fweight8,  "Xl  ",  flnumbet); 

} 

fprintf(fweights, "  Xu" ); 

} 

fclo8e(handle); 

} 

fcloee(fweights) ; 
fclose(fn8er); 


/»*****««*********«**«***«***««*«***************«*«*«**««•*************« 

We  now  have  the  kltjs.dAt  Sle  containing  an  equal  number  of  sets  of 
kit  coefficients  for  each  user.  We  next  will  rewrite  the  kltj.dat 
Sle  to  represent  concatenated  "chunks”  of  vectors.  If  using  the 
actual  codebook  vectors  rather  than  the  source  speaker  data,  we  don 't 
need  to  do  this;  we’ll  want  each  "chunk”  equal  to  1. 
♦♦*♦*•♦*♦*♦♦**♦♦♦♦*♦♦♦♦♦♦♦♦♦♦♦♦*♦*♦+♦♦♦♦♦♦♦♦♦♦**♦*♦♦*♦♦♦♦♦♦♦*»*♦**»♦*♦♦/ 

nnm_chnnk8  =  (int)  (minmnm.vectors/nnm  jn.chunk); 
totalmumjcoefe  =  num Jnjchunk  *  num.coefs; 
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opeiijread(fweights,  "klt_8.dat"); 


open.write(handle,  "to^.dat"); 

Loopli(2) 

{ 

f8canf(tweights,  "Xd",  &temp); 
fprintf(handle,  "Xd\ii",  temp); 

} 

count  =  0; 

Looplk(num-train) 

Loopli(num.ciinBk8) 

{ 

fprintf(handle,  "Xd  ",  count); 

Looplj(total-numjcoef8) 

{ 

f8canf(fweight8,  "Xf ",  &flnnmber); 
fprintf(handle,  "Xf  ",  flnnmber); 

} 

fprintf(handle,  "0.10000  0.10000\n"); 
count++; 

} 

leftover  =  (minmum-vectois  *  num-coefs)  —  (nnm.chnnk8  *  total_nnm^oefe); 
Looplj(leftover)  fgcanf(fweight8,  "Xf ",  fcwaste); 


} 

fclo8e(handle); 
fcloee(fweigiit8) ; 

8y8tem("av  tenp.dat  klt_s.dat"); 


This  is  tie  beginning  of  tie  non-linear  transformation  portions  for 
speakers.  It  functions  tie  same  as  described  above  for  faces. 

^itt^^^mm***************************************************************/ 


if  (FOM  ==  2) 

{ 

openjead(fweight8,  "klt.s.dat"); 
open-write(nonlindata,  "spkrtest  .dat"); 
num.protos  =  64; 

fprintf(nonlindata,  "Xd\nXd\nXd\n",  num  Jeatures,  num.train,  num.protos); 
Loopli(2) 

i8canf(fweight8,  "Xd",  feint-waste); 

Loopli(num-train) 

{ 

&canf(fweights,  "Xd",  &int.waste); 

Looplj(num.train) 

{ 

£Bcanf(fweiglits,  "Xf  ",  &fl.wa8te); 
fprintf(nonlindata,  "Xf  ",  fi.waste); 

} 

Looplj(2) 

fscanf(fweights,  "Xf  Xf",  &fl-waste); 
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fprintf(ooiiliBdata,  "\a"); 

} 

fclo«e(fweig}it8); 

fclo6e(noiilindata); 

sprintf(command,  "■ake.bpnat  Xd",  nunucoefs); 
system(command) ; 
systemC'ziiatpush  spkrtast.dat"); 
opeiiJcead(nonlindata,  "naatest .  daf); 
open-wiite(fweight8,  "klt_l.dat"); 
fpriiitf(fweights,  "Xd\iiXd\ii",  numjcoefs,  num^ass); 
Loopli(nuin-traiii) 

{ 

fprintf(fweights,  "Xd  ",  i  —  1); 

Looplj(nuin.coeb) 

{ 

f8canf(noiilindata,  "X<",  Icfl-waste); 
fprintf(fweight8,  "Xt  ",  fl.waate); 

} 

fprintf(fweights,  "0.90000  0.90000\ii"); 

} 

{do8e(fweight8); 
fdoae(nonlindata) ; 

} 


^^t^t****************************************************************** 

This  is  the  end  of  the  nonlinear  xfoimstion  part.  The  kltjs.dst  file 
has  been  rebailt  with  ’eigcoetb’  numbei  of  new  {estates  fot  each 
prototype. 


}  l***End  ’if  SPEAKER  ==  1  ’  clause***/ 


/^^a******************************************************************** 
So  now  the  file  kltjs.dat  contains  at  tows  of  eigen-coeSdents 
corresponding  to  w  prototypes  for  each  user. 

/*****Create  the  train-params  Sle  for  the  face  verification  phase  *****/ 


if  (FACE  ==  1) 

{ 

open-wiite(fparam,  "train.f.parau"); 

fprintf(fparam,"Xd\nXd\nXd\nXd\nXd\n",SM.WIDTH,nnm-coei8,nuni.train,  num.protos,  num-dass) 

} 

else  if  (SPEAKER  ==  1) 

{ 

open-wiite(fparam,  "train_s_parau"); 

fprintf(fparam,"Xd\nXd\nXd\nXd\n",namxoeis,  num-dass,  numJn.chunk,  num.train  —  1); 

} 
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fclo8e(fparam); 

/►♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦Now  UaJji  the  net  for  each  user******************/ 
if  (FACE  ==  1) 

8trcpy(filename,  "U8er_face_li8t"); 
else  if  (SPEAKER  ==  1) 

strcpy(filename,  "user.speaker.list"); 

opeii-read(faser,  filename); 
if  (fopen("te*p.dat",  "r")  /  NULL) 
systemC'rm  temp.dat"); 


while  (fscanf(fu8er,  "Xs",  username)  yt  EOF) 

{ 

printf("\n\n\n\n\n  Training  the  net  for  user  Xs.\n",  nsername) 


/%^,^:^0^****ti**********************************^*m******^************** 

Now  copy  the  Idt.dat  file  created  during  the  (voice-Jnet-CoeMcients  program 
to  a  new  Sle,  replacing  the  last  two  numbers  in  each  vector  with  0.1  and  0.9, 
depending  on  which  class  the  particular  image  (speaker)  belongs  to.  Remember, 
there  are  only  two  classes:  either  the  vector  is  a  particular  user, 
or  it’s  one  of  the  other  users. 

**s**********************************s*****************************/ 

if  (FACE  ==  1) 

{ 

open-iead(flist,  "face.liat"); 
open-read(fdat,  "klt_f.dat"); 
open-write(fdat2,  "tenp.dat"); 
fscanf(fdat,  "Xd\nXd\n",  &namberl,  &nambei2); 
fprintf(fdat2,  "Xd\nXd\n",  numberl,  numbei2); 

Loopli(nam_train) 

{ 

fscanf(fdat,  "Xd",  &numbeil); 
fprintf(fdat2,  "Xd  ",  numberl); 

Looplj(num-coefs) 

{ 

iscanf(fdat,  "Xf ",  &flnumber); 
fprintf(fdat2,  "Xf  ",  flnumber); 

} 

fscanf(fdat,  "Xf ",  &flnumber); 
fBcanf(fdat,  "Xf ",  &flnumber); 

£scanf(flist,"Xs\n",  &fiiename); 
strcpy  ( user  1  ,filename) ; 


/^i*^^^******^^^<^*^i**^t^tt*V*************************************0******* 

Parse  the  current  images  filename  to  determine  whether  it  is  the 
same  as  the  user  currently  being  trained  on. 
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**«*•**«««««*«**********************************«*«*•******************/ 

j  =  0; 

wliile(u8eil|j]  ^  ’\0’) 

{ 

if  (isalpha(useil[i])) 

j++; 

else 

useil[j]  =  0; 

} 

if  (strcmp(userl,  username)  ==  0) 

{ 

fprinif(fdat2,  "0.900000  0.100000\n"); 

} 

else 

fprintf(fdat2,  "0.100000  0.900000\n"); 

} 

}  /»**End  of  ’if  (FACE  ==  1)’  clause  **♦/ 

else  if  (SPEAKER  =:=  1) 

{ 

openjead(fdat,  "klt_B.dat"); 
open_write(fdat2,  "teap.dat"); 
f8canf(fdat,  "Xd\nXd\n",  &numberl,  &namber2); 
fprintf(fdat2,  "Xd\nXd\n",  total juum.coefs,  number2); 

Loopli  (num-train) 

{ 

if  (i  ==  userjcounter)  identity  =  1; 
else  identity  =  0; 

Looplj(num.chunks) 

{ 

fBcanf(fdat,  "Xd",  fenumberl); 
fprintf(fdat2,  "Xd  ",  numbetl); 

Loopl  k(total_num  jcoefs) 

{ 

iscanf(fdat,  "Xf ",  &finamber); 
fprintf(fdat2,  "Xf  ",  finumbei); 

} 

if  (identity  ==  1) 

fprintf(fdat2,  "0.900000  0.100000\n"); 
else 

fprintf(fdat2,  "0.100000  0.900000\ii"); 

Looplk(2) 

fscanf(fdat,  "Xf ",  &flnumber); 

} 

} 

} 


U8er_counter++; 

fclose(fdat); 

fclose(fdat2); 

fclo8e(flist); 
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if  (SPEAKER  ==  1)  systemC'BT  teap.dat  klt_B.dat"); 
else  if  (FACE  ==  1)  8ystem("aT  taap.dat  klt_f.dat"); 

/****************Assigtt  user  name  to  weights  Sle*******************/ 

if  (FACE  ==  1)  sprintf(wtJile,  "5i«_f_klt.»t8",  username); 

else  if  (SPEAKER  ==  1)  spiintf(wtJUe,  ''Xa.a.klt.wts",  username); 


Z************  Create  tAe  setup  file  for  the  neural  network  •♦*»*•***♦/ 

if  (SPEAKER  ==  1) 

{ 

num-coefis  =  totaljaum.coeAs; 
strcpy  (dat  Jile,  "kit  _8 .  dat " ) ; 

} 

else 

strcpy(datJile,  "klt_f.dat"); 
hidjtodes  =  2  *  numjcoefs; 
hid  jiodes2  =  0; 

open-iead(&ample,  " source.setup . alp"); 
open-wtite(&et,  "setup. alp"); 

Loopli(4) 

{ 

f8canf(f8ampie,  "Xd",  &int.wa8te); 
fpiintf(fiBet,  "Xd\n",  int.waste); 

} 

fprintf(iiBet,  "Xs  -store  veightsXn",  wtJile); 

i8canf(£sample,  ”Xd",  &int-waste);  /***Max  iterations***/ 
fprintf(fset,  "Xd\n",  int.waste); 

fprintf(&et,  "Xd  Xd  Xd  XdVn",  num-coefs,  hidjtodes,  liidjiode82,  numxlass); 
fpiintf(£set,  "Xs  -data\n”,  dat  Jile); 

f8canf(f8ample,  "Xd",  &int.wa8te); 
fprintf(&et,  "Xd\n",  int.waste); 

Loopli(4) 

{ 

&canf(isample,  "Xf ",  &fl.waste); 
fprintf(lset,  "Xi\n",  fl.waste); 

} 

fprintf(lset,  "Xd\nXf\nXd\ii",  BAT.SZ,TRAINJ’CT,NORM); 

fclose(fset); 

fclose(fsample) ; 

system("alp_fu8e_tm"); 

}  /*******C!losing  while  loop*****/ 

fclo8e(fuser); 

if  (FACE  ==  1)  system("ra  *.gra"); 

printf("\nTlUI*IMG  IS  COMPLETEVn"); 

} 
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A.6.S  face.net. ate fficienta.c. 


Program;  facemet.coefiScients.c 


Deaaiption:  This  program  mai^  a  test  lace  onto  the  set  of  eigenfaces  and  stores  the  KL  coefScients  in 

traitt.coe£i  in  a  format  the  neural  network  can  read. 

Author:  Pedro  Suarez  (Originally  recon.c) 

Date;  24  July  91 

ModiSed  by:  Ken  Runyon  (Chopped  off  reconstruction ) 

Date:  22  Jun  92 

Afodihcation  Description:  I  decided  we  didn't  need  to  actually  reconstruct  and  store  a  face.  I  also  made 
the  stand  alone  program  into  a  module  which  is  called  by  thesis. 

*«*******««**«*«*********«*****«**«************«*****«»**/ 

#inclade  <8tdio.h> 

#inclnde  <math.h> 

#iBdnde  "jkaacros.h" 

void  {acejaetjcoeificient8(dimeii8ion,  nnmjcoeb,  infUename,  outfile,  classiile,  nnm^lass.fom,  type) 
int  dimension, 
nnm.coei8, 
nnm -class, 
fom, 
type; 

cha;  iniilenameQ; 

FILE  soutiile, 

*cla88file; 

{ 

FILE  *facel,  *eigenin,  *tiain,  *face-avg,  ^testiile; 

int  i,  j,  1,  N,  M,  atoi(); 

static  int  count  =  0,  exemplar  =  0; 

float  ♦vector(),  **matrix(),  *average_face,  **u,  *pedro,  *reconface; 
float  *w,  *1; 

char  filename[81],  ♦strcpyO,  user[15],  ext[10]; 
static  char  userl[15],user2[lS]; 


#i£adef  RESULTS 

printf("\nPulling  Coefficients  for  XsXn”, infilename); 
#endif 


/•>**♦*♦♦♦♦♦♦♦♦♦♦♦♦*  Set  Up  Files  *s*s*************s******/ 


A-24 


/***  Open  Test  Face  ««*/ 


if  ((facel=fopeii(infilename,"r"))  ==  NULL){ 
printfC'l  can't  op«n  the  input  file"); 
exit(-l); 

} 

/*♦♦  Open  Avg  Face  ***/ 

if  ((face-avg=fopen("aTg_lace.dat","r"))  ==  NULL){ 
printf("I  can't  open  aeg_face.dat."); 
exit(-l); 

} 

/****««  set  up  matrices  ***«*«/ 

N  =  dimension  *  dimension; 

M  =  nnmjcoefs; 

u  =  matrix(l,N,  1,M); 
pedro  =  vector(l,  N); 
averageJivcr  =  vectot(l,  N); 

reconface  =  vectot(l,  N);  /♦♦♦  DO  I  NEED  THIS?  ♦*</ 

w  =  vectoi(l,  M); 

I  =  vector(l,  N); 


Initalize  Matrices  *«**«*/ 

forO=l;  j<M;  j++) 
for(i=l;i<N;i++) 

w[j]=up][j]=l[i]=pedro[i]=reconface[i]=averagejace(i]=0.0; 

/»«**«****  Load  the  Test  Face  into  the  Pedro  Vector  *********/ 

for(j=ly<Na++) 
fecanf(facel,"Xf  \n",  &pedro[j]); 
fclose(facel); 

Load  the  Average  Face  into  the  Average.Face  Vector  ****/ 
for(j=ly<Nu++) 

iscanf(face^vg,  "Xf\n",  &averageJace[j]); 
fclose(face^vg) ; 

/***«*«  Load  tLe  Eigenfaces  into  Matrix  U  ******/ 

open_iead(train,  "face.train.out"); 

for(j=l;  j<M;  j++){ 

fKanf(tiain,  "XsXn",  filename); 
eigenin  =  fopen(iilename,  "r"); 
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for(i=l:i<N;i++){ 

lscaiif(eigeiuii,"Xl\n",&a[i][i]); 

} 

fcioee(eigeiiiii); 

} 

fdo8e(tiaiB); 

/»««*««  Subtract  the  Average  Face  from  tjhe  Teat  Face  **«**«/ 

for(i=l;i<N;i++) 

I[i]=:  pedro[i]  —  aver  age  Jace[i]; 

/******  Calculate  the  KL  CoefSdents  **«*•*/ 

for(j=l;  j<M;  j++) 
for(i=l;  i<N;  i++) 

w[j]  =  u[i][j]*  I[i]+  w[j]; 

/»***«*  Write  an  exemplar  number  to  the  Sle  ******/ 

fprintf(oiitiUe,  "Xd  ",  exemplar); 
exemplar++; 


/aa****  Write  the  CoefRdents  to  the  *^oe£s  File  ♦♦♦*♦*/ 
/*******Write  test  Sle  containing  juat  the  dgcoefb******f 
open-append(te8tiile,  "eigcoefls"); 


for(i=l;  i<M;  i++) 

{ 

fprintf(ontfile,  "Xl  ",  w[i]); 

^rintf(te8tfile,  "Xf  ",  w[i]); 

} 

fprintf(te8tfile,  "\n"); 
fcloee(te8tfile) ; 

/»****«  Write  the  deaired  outputs  to  the  *.coe&  File  *****4/ 


foi(l=:nam  jdass;l  >  1  ;1 - ) 

fprintf(oatfile,"Xl  ",  0.90000); 

fpiintf(oiitfile,  "\n"); 
free-matrix(u,l,N,l,M); 

}  /♦  end  coefSdenta.c  */ 
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A. 6. 4  voice^neLcoefficients.c. 


/»*«««*«***•*««****«****«««*«*****•***«****«•**«**««***** 

Progttm:  voice_aet-coeffidejits.c 

Description:  This  program  will  tnke  a  set  of  voice  codebooks,  extract  tile  KLT  coefBdents  of  each,  and 

write  them  to  the  hie  hlt.dat  for  use  by  the  net. 

Author:  John  G.  Keller  (based  on  the  program  "recon-c"  originally  written  by  Pedro  Suarez  and 

modiSed  by  Ken  Runyon). 

Date:  22  Aug  93 

#mclnde  <8t(lio.h> 

#inclade  <math.h> 

#indude  "jkBacros.h" 

void  voice-net.jCoefficieiit8(length,  nnm^oe&,  datiile,  uaeriUe,  num.daa8,  num.tiain) 
int  length, 
nttm.coe&, 
nnm-daM, 
num.train; 


FILE  adatfile, 

*aserfile; 

{ 

FILE  *code,  *eigeiun,  *train,  *avg_voice; 

int  i,  j,  k; 

int  exemplar  =  0, 
waste; 

float  «vector(), 

*«matrix(), 

*average.voice, 

♦*u, 

♦voice, 

♦w, 

*1; 

char  filename[81]; 


/»*♦*««  set  up  matrices  *♦♦«♦♦/ 

n  =  matrix(l, length,  l,nam^oeIs); 
voice  =  vector(l,  length); 
average-voice  =  vectot(l,  length); 
w  =  vector(l,  nnm-coeis); 

1  =  vectoi(l,  length); 
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/»•«««•  Jait*liae  Matrices  ******/ 

Looplu(Bam.coe&,  length) 

wp]=u[jlpl=I[jl=v<Mcelj]=aver»ge_voice[j]=0.0; 

/**•**•**«  Load  the  average  voice  into  the  average-voice  vector  **•«/ 

opeBjea<l(avg_v(»ce,  "aTg.apaaker.dat"); 

Loopli(length) 

fiK;aBf(avg_voice,  "Xf  \b”,  Jraverage.voice{i]);  ~ 

fcloae(avg.v(ace); 

/»***«*  Load  the  Eigetupe*ken  into  matrix  1/  «***«*/ 

openjead(train,  "npaakar. train. out"); 

Loopli(num-coeb) 

{ 

fscanf(train,  "Xa\n",  filename); 
open^ad(eigenin,  filename); 

Looplj(length) 

{ 

fKanf(eigenin,"lt\n",&a[i]^]); 

} 

fclose(eigenin); 

} 

fclaae(train); 

/>*««**«**«***««*«ee*««***e««***********«*«*ve*«*«v***v***« 

Now  loop  through  the  data,  taking  one  full  vector  at  a  time  and 
pulling  the  hit  coefficients. 

•eeeevvveeeeevveeev***************************************/ 

Loopli(2)  {scanf(n8erfile,  "Xf ",  Xcwaste); 

Looplk(nam.ttain) 

{ 


/teeeeeeee  Load  the  spenker  vector  into  voice-vector  •♦**♦♦♦*♦/ 
Loopli(length) 

£Kanf(n8erfile,  "Xf ",  &voicep]); 

/******  Subtract  o/F  the  average  voice  *«*«««/ 

Loopli(length) 

Ip]=  voice^]  —  average.voicefi]; 
f******  Ca/cuiate  the  KL  CoeScients  ******/ 
w[l]  =  0.0; 

Looplij(nam.coe&,  length) 
w[i]  =  u[j][i]  ♦  I[j]+  w[i]; 

/»*«***  IVrite  the  CoeiRdents  to  <user>j.klt.d*t  *#***«/ 
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Loopli(nii]n.coe{s) 

^iiBtf(d*tlile,  "XI  ",  w[i]); 

fpriiitf(d»tiile,  "\n"); 

} 

{clo8e(a8erfUe); 

free  Jmatrix(a,l  .length,  1  .nnmjcoefs); 
{iee-Tectoi( voice,  1,  length); 
{iee.vectoi(avenge-voice,  1,  length); 
im.vector(w,  1,  nunucoefii); 
bee.vector(I,  1,  length); 

} 

A.  6. 5  kltjnmtine.c. 


Ptogtvn:  kltjoutine.c 

Description:  This  routine  is  based  around  the  Jacobi  rotation  routine 
found  in  Nnmericai  Recipes  in  C.  Given  a  data  matrix,  it  will  Snt  calculate 
the  covariance  matrix,  and  then  the  eigenvectors.  The  eigenvector  matrix 
is  then  put  into  descending  eigenvalue  order  and  returned  to  the  calling 
program.  The  eigenvectors  may  also  be  printed  out  here  by  un>commenting 
the  appropriate  lines.  The  routine  can  be  used  with  either  face  or  speaker  data. 

Author:  John  G.  KeUer/Dennis  Ktepp 

Date:  1  Sep  93 

i$^^V********************V****************V**********V*V****»**«******/ 

#inclade  <8tdio.h> 

#inclnde  <math.h> 

#inclade  <8tring.h> 

#inclnde  "jknacros.h" 

kltjoatine(char  iileJut(],  int  type,  int  nnmjeigvectors,  int  dim,  int  nnm.train,  int  increment) 

{ 

FILE  vdatfile,  vontiile,  vaverage,  *train,  *code; 
int  length, 
waste, 

num-codewoids, 

Bum-dasses, 

nnmJceepulimeBsions, 

dimjinmber; 

int  ij,  N,  k,  M, 
mot; 

float  **matiix(). 
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«vectot(), 

♦♦A, 

*«A-tlU8, 

•♦u, 

**L, 

**v, 

•d, 

♦ftverage.temp, 

temp; 

vmd  free.vectoiO,  &eejnatrix(),  eigsrt(),  jacobi(); 

char  t]rpejiame[30], 
avgjile[30], 
iMg[30], 
magi  [30], 
filename[40], 
file{40]. 

dataJileiiame[40] ; 

opeii-read(tr«i]i,  (ileJist); 

if  (type  as  0)  apeaJcer  veri/icatioa***/ 

{ 

fM;anf(traiii,  "Xs",  dataJilename); 

£Kaiif(tiaiii,  "Xd",  fenumudasaea); 
facaiif( train,  "Xd",  tcnamjcodeworda); 
&canf(train,  "Xd",  ^length); 
nnm.train  a  nnm.claaaea  «  nnmjcodewoida; 

} 

elae  if  (type  ==  1)  face  veiiication***/ 

{ 

length  a  dim  «  dim; 

) 

i  =  0; 

8trq>y(type-name,  ""); 

while  (iileJiatp]  ^  ’  ||  filejut[i]  •_•) 

{ 

if  (iUeJiat[i]  a=  ' . '  fileJi8t[i]  a=  >_') 

{ 

type^ame[i]  =  0; 
break; 

} 

type^ame^]  a  fileJi8t[i]; 

t 

i++; 

} 

/**•*••  Allocate  memory  *♦♦♦•♦/ 

A-trana  a  matrix(l,nam-train,l, length); 

A  a  matiix(l, length,!, nam-train); 
average.temp  =  vector(l, length); 


A-30 


if  (type  ==  0)  f***Spe*ket***/ 

{ 

L  =  matrix(l, length,!  .length); 
d  =  vector(14ength); 

V  =  matiix(l  length,  1, length); 

} 

else  if  (type  ==  1) 

{ 

L  =  matrix(l,nain.trnin,l,nnin_train); 
d  =  vectot(l,nani-trnin); 

V  —  mxtiix(l,nnm-train,l,num.train); 

} 


/»**•«*  laitMlue  matrices  and  vectota  ***«*•/ 

if  (type  ==  0)  speaker***/ 

{ 

Loopli(niim.train) 

{ 

Looplj(length) 

Ajtrans^][j]  =  A[j][i]  =  0.0; 

} 

Loopli(length) 

{ 

dp]  s  avetage.tempp]  =  0.0; 

Looplj(length) 

Lplti]  =  vpjp]  =  0.0; 

} 

} 

else  if  (type  ==  1) 

{ 

Loopli(nain.tiain) 

{ 

dp]  =  aveiage-tempp]  =  0.0; 

Looplk(nnm.tiain) 
vp][k]  =  Lp][k]  =  0.0; 

Looplj(length) 

A-tian8p][i]  =  Ap]p]  =  0.0; 

} 

} 

if  (type  ==  0)  /***Speaker***/ 

i 

printf("\nThe  users  being  trained  on  are  :\n\n”) 
open-iead(code,  dataJilename); 

Loopli(nam.tiain) 

{ 

if  (i  ==  1  II  ((numjcodewotds  +  1)  %  i)  ==  0) 

{ 

i8canf(tiain,  "Xs\n",  filename); 
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priBtf("\t\tX8\ii",  filenune); 

} 

Looplj(length) 

{ 

bcanf(code,  "Xl",  &A[j][i]); 

} 

} 

} 

else  if  (type  ==  1) 

{ 

printf("The  files  being  trained  on  are  :\n\n”); 
Loopli(num.ttain) 

{ 

f8canf( train,  "XsXn",  filename); 
piint{("\t\tXB\n",  filename); 
open-iead(code,  filename); 

Looplj(length) 

&canf(code,  "Xl\n",  &A[j](i]); 
fclo8e(code); 

} 

} 

if  (type  ==  0)  fclose(code); 
fclose(  train); 

/***************CaIculate  tvetage  vector*******************/ 

8printf(avgJile,  "aTg_X8.dat'',  type-name); 
open.write(average,  avgjile); 

Loopli(lengtli) 

{ 

temp  =  0.0; 

Looplj(nam-train) 

{ 

temp  +=  A(i][j]; 

} 

average-temp[i]  =  temp/num.train; 
fprintf( average,  "Xf\n",  average.temp[i]); 

} 

fclo8e(average); 

/**************Subtriu:t  average  vector***********************/ 

Looplj(nttm-train) 

Loopli(lengtli) 

A[i][j]  =  A[i][j]  -  average.temp[i]; 
free-vector(average.temp,  1,  length); 

/*************Make  transpose  matrix************************/ 
Looplj(nam-train) 
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Loopli(length) 

A-ttaM[j][ij  =  A[i][j]; 

/k*«**«*««««***«***«Matrix  multiply  A  by  itself**************^/ 

if  (type  ==  0)  /***SpetJKi***/ 

{ 

Loopli(length) 

Looplj  (length) 

{ 

temp  =  0.0; 

Looplk(nam-trun) 

temp  =  temp  +  A.ttans[k][i]  *  A[j][k); 

L[i][j]  =  temp; 

} 

} 

else  if  (type  ==  1) 

{ 

Loopli(num-tiain) 

Looplj(nam.train) 

{ 

temp  =  0.0; 

Looplk(lei.gth) 

temp  =  temp  +  A-trans[i]pc]  ♦  A[k][j]; 

LpJUJ  =  temp; 

} 

} 

&ee-matrix(A^raji8,  1,  num.train,  1,  length); 

/»***«*******«**Do  Jacobi  rotatioo  and  sort  eigenstuff***********/ 
if  (type  ==  0)  /♦♦♦Speaker***/ 

{ 

jacobi(L,  length,  d,  t,  &niot); 
eigsrt(d,  v,  length); 

} 

else  if  (type  ==  1) 

{ 

jacobi(L,  nnm.tiain,  d,  v,  &niot); 
eigsrt(d,  v,  nnm.tiain); 

} 

/♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦Find  eigenbases*******************************/ 

n  =  matTix(l,  length,  1,  nnm.train); 

Loopli(nnm-train) 

Looplj(length) 
u[j][i]  =  0.0; 

if  (type  ==  0)  /♦♦♦Speaker***/ 

{ 

Loopli(num-ttain) 

Looplj(length) 
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Looplk(length) 

ttWW  =  v[k][i]  *  A[k][i]  +  u[j][i]; 


} 

else  if  (type  ==  1) 

{ 


Loopli(nttm.eigvectoi8) 

Looplj(nam-train) 

Looplk(length) 

u[k][i]  =  v[j][i]  ♦  A[k]D]  +  u[k][i]; 


} 

/»****«**«**  Vl^ite  file  contditting  list  of  eigenvector  names*********/ 

8pri]itf(m8g,  "Xs.train.out",  typeoiame); 
open.write(ou title,  msg); 
sprintf(m8gl,  "eigenXB",  typejiame); 

Loopli(niun-eigvector8) 

{ 

8printf(iUe,  "tslld.dat",  msgl,  i); 
fpiintf(oatiUe,  "Xa\n",  file); 

} 

fdo8e(oatfile); 

/***********Wtite  out  tJie  eigenbases******************************/ 

opeiuead(outfile,  msg); 

Loopli(nam.eigvectors) 

{ 

f8canf(oatfile,  "Xa",  &filc); 
open-wiite(datfile,  file); 

Looplj(lengtli) 

fprintf(<iatfile,  "XgXn",  u[j][i]); 

fclo8e(datfile); 

} 

fclo8e(ontfile); 

&ee-matiix(A,  1,  length,  1,  num.tiain); 
freejnatiix(n,  1,  length,  1,  num.tiain); 
if  (type  ==  0) 

{ 

&eeja!iatrix(L,  1,  length,  1,  length); 
fieejnatiix(v,l,  length,!,  length); 
ftee-vectoi(d,  1, length); 

} 

else  if  (type  ==  1) 

{ 

&ee.matrix(L,  1  ,num-tTain,  1  ,nam-tiain) ; 
free.matiix(y,l, num.tiain,!, num.tiain); 
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&e«_vector(d,  l,nuin-tiain); 

} 


} 

A. 6. 6  fomjroutine.c. 

^***«««««******«*«******«*««4i*********«**««**«****************«*********** 

Program:  fomjoutine.c 


Description:  The  purpose  of  this  program  is  to  allow  a  comparison 

between  the  eigenvectors  with  the  highest  eigenvalues  (found  during 
the  KL  transformation)  and  the  in-class  importance  of  those  eigen- 
dimensions  when  each  class  is  individually  projected  into  the 
eigenspace.  The  program  will: 

-  read  in  data  from  a  sourceSle  containing  multiple  classes  and 
multiple  data  points  of  some  dimension  n 

-  read  in  an  eigenspace  defined  (via  the  kit  process)  from  the 
raw  data  points 

-  calculate  the  in-class  and  across-class  means  and  variances 
of  the  eigen-dimensions 

-  calculate  a  figure  of  merit  (FOM)  for  each  set  of  eigen- 
dimensions 

The  program  requires  a  user-specified  input  data  file  containing  class  data.  The  first  three  lines  of 
the  data  file  must  contain  the  number  of  classes  within  the  file,  the  number  of  vectors  per  class,  and  the 
dimensionality  of  each  vector.  The  listing  of  vectors,  by  class,  will  follow. 

The  output  of  the  program  will  be  a  listing  of  the  above 
statistics.  The  output  eigen-dimensions  will  be  listed  in  order  of 
decreasing  eigenvalues,  and  we  will  be  able  to  make  a  direct  comparison 
between  the  FOMs  and  those  eigenvalues  to  determine  which  eigen-dimensions 
appear  to  be  most  important  for  the  given  data  set. 

AutJior:  John  G.  Keller 

Date:  29  Aug  93 

#indade  <stdio.h> 

#include  <math.h> 

#include  <string.h> 

#include  "nrutil.h" 

#inclade  "jknacros.h" 


foin-routine(char  iileJist|],  int  type,  int  num.coefis,  int  dim,  int  num.train,  int  start. classes,  int  num-protos) 

{ 

char  command[40],  msg[40],  msgl[40],  dataJUename[40],  typej(tame[30],  iile[40]; 
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iat  vKton.|M»^aM, 

■onueigcoeii, 

avmjeigTecton, 

BomJtn, 

total.vecton, 

i.  j.  k; 

float  tempi , 
temp2, 

8Bm.eigvalue, 

SBmJom, 

*fom, 

*acioe8-class-variance, 

*acToes.cl  ass-mean, 

*mean.vectoi, 

*eigvalue, 

*fom  jordered.vectot , 

♦mean-of-var, 

*var^Lmeans, 

«fomjenot, 

*cam.fom  jertoi , 

*eigjerror, 

*cnm-eigjenor, 

**eigenmatrix, 

**fomjordered.eigmatrix, 

**datajiiatrix, 

**newdata-matrix, 

<t>*da88-mean, 

**cla8S- variance, 
vvu; 

FILE  vdatfile,  voutfUe,  vfhandle,  vtrain,  *avg.file; 

/t****^^*^^*i^*****tm*****************************************  *************** 

Read  in  initialization  data  tram  source  Sles 

^i^^::ti^:t^^^^^^:^^^^^^^^^t*****************************************************/ 

open-iead(tiain,  file.list); 

if  (type  ==  0)  f***If  speaker  verification***/ 

{ 

Cscanf(train,  "Xs",  data-fllename); 
fscanf(tram,  "Xd",  &numjdasses); 
iscanf(ttain,  "Xd",  j£vectois.per.class); 
f8canf( train,  "Xd",  &namJtrs); 
num-eigvectors  =  numjeigcoe&  =  num-ftrs; 

} 

else  if  (type  ==  1)  /***!{  face  verification***/ 

{ 

nnm-ftis  =  dim  *  dim; 

nom-eigvectots  =  num-eigcoefs  =  num-train; 
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num-cluses  s  stait-classes; 
vectois-peijcl&ss  =  nnm-piotoe; 

} 

i  =  0; 

8ticpy(typejiame, 

while  (iileJist[i]  /  ’ .  ’  ||  fileJistfi]  ^ 

{ 

if  (fileJistp]  ==  ’ .  ’  11  fileJistfi]  == 

{ 

type-name[i]  =  0; 
bIe^Jc; 

} 

type^ame[i]  =  fUe-li8t[i]; 

t 

i++; 

} 

total-vectors  =  num-classes  «  vectors.perjclass; 


Declue/initialize  mdtnces  and  vectors 

***************************************************************************/ 


fom  s;  vectoi(l,  nnm.eigvectois); 
across-class-variance  =s  vector(l,  num^asses); 
acioss-dasB-niean  =  vector(l,  numjclasses); 
mean.vector  =  vectot(l,  nnmJftrs); 
eigvalae  =  vector(l,  nunueigvectois); 
meanjof-var  =  vectoi(l,  nnmJtrs); 
varjof-means  =  vector(l,  nam.ftrs); 
fomjoidered-vector  =  vector(l,  num-ftrs); 
eigjerror  =  vector(l,  nom-ftis); 
cnmjeigjenor  =  vector(l,  nam-ftrs); 
fom.eitot  =  vector(l,  numJtis); 
camJTom-eiioi  =  vectoi(l,  num-ftis); 

eigenmatiix  =  matiix(l,  nnnueigcoefs,  1,  num.eig vectors); 
fom-ordered-eigmatrix  =  matrix(l,  num^eigcoefs,  1,  nom-eigvectors); 
datajmatrix  =  matrix(l,  namJtrs,  1,  total.vectors); 
newdata-matrix  =  matrix(l,  num.ftrs,  1,  total.vectors); 
class-mean  =  matrix(l,  num.ftrs,  1,  num-classes); 
class-variance  =  matrix(l,  num.ftrs,  1,  num.classes); 
u  =  matrix(l,  nnm-ftrs,  1,  total.vectors); 

Loopli(num.eigcoe&) 

{ 

fom[i]  =  eigvalue[i]  =  mean_vector[i]  =  0.0; 
Looplj(nam.eigvectors) 
eigenmatrix[i][i]  =  0.0; 

Looplj(num.cl  asses) 

class.mean[i][j]  =  clas8.variance[i][j]  =  0.0; 

} 
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if  (type  ==  0) 

{ 

Loopli(total-vectois) 

Looplj(nam_eigvector8) 
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Looplk(numJii8) 

new<latunatrix[i][i]  +=  datajnatrixp[][i]  *  eigenmatrix[k][i]; 

} 

else  if  (type  ==  1) 

{ 

Loopli(total.vectoi8) 

Looplj(total_vectoi8) 

Looplk(ntt]iiJtt8) 

newdata-inatiix[k][i]  +=  data^atrix[k][j]  *  eigenmatrix[i][i]; 

} 

f^i^*nt***t******************************************************************* 

Calculate  ciass_mean  and  class.vatiance  matrices 

*«*****«**«***«*«**««*«*«**«  ****«*«***«**«*********«*•*******«*««****«*«*/ 

Loopli(num-clas8e8) 

Looplj(nnm-ftr8) 

{ 

tempi  =  0.0; 

Looplk(vectors-per^ag8) 

{ 

tempi  +=  newdata^atrix[i]pc  +  ((i  —  1)  *  vectors-petjclass)]; 

} 

class  jneaii[j][i]  =  templ/wctots-perjdass; 

} 

Loopli(nam.clas8e8) 

Looplj(numJtr8) 

{ 

temp2  =  0.0; 

Looplk(vectors-pei^lass) 

{ 

temp2  +=  (newdata_matrix[j]Pc  +  ((i  —  1)  *  vectors.pei^ass)]  —  clas8-mea]i[j][i])  *  (new- 
datajmatrix[i][k  +  ((i  —  1)  ♦  vectors-per-class)]  —  cla88jneaii[i][i]) ; 

} 

class-variance[j][i]  =  temp2/vector8-perjdass; 

} 


Calculate  across-dass^ean  matrix 

***«******«****«*«*««******************««**4>*********«*«***********«***«***/ 

Loopli(nnm-ftr8) 

{ 

tempi  =  0.0; 

Looplj(num.clas8e8) 

tempi  +=  dass^ean[i][j]; 
acios8-cla8sjnean[i]  =  templ/mim.classes; 

} 
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Calcniate  ine*n-o£.w  Mud  nijoLmeui  vectors 

ss*s*******s***************************************************************/ 


Loopli(nnmJtr8) 

{ 

tempi  =  0.0; 
temp2  =  0.0; 

Looplj(num-cla88e8) 

{ 

tempi  +=  claa8.vuiaiice[i][j]; 

temp2  +=  (da8Sjnean[i][j]  —  acro68-clafl8jneaii[i])  *  (classjnean^][i]  —  acioss.classjneaii[i]); 

} 

meaii-oLvar[i]  =  templ/imm.dasae8; 
vujoLmeans^]  =  temp2/mun.das8es; 

} 


Calculate  fom  (Figure  of  Merit)  vector 

s**************************************************************************/ 


Loopli(namdtn) 

fomjoidered-vectoi[i]  s  fom^]  =  varjof-meaii8[i]/meanjoLvat[i}; 

/000s********************************************************************** 

Reorder  the  eigenvectors  in  order  of  decreasing  FoM. 

00000000000000000000000000000000000000000000000000000000000000000000000000/ 

eigsrt(fomjordeied.vector,  fomjoideied.eigmatrix,  nnm.dgvectoT8); 

/***For  testing,  print  out  the  eigvnlue  ordered  vector  vs  the 

FOM  ordered  vector 

000/ 

opeii.wiite(oxitiUe,  "eigrsloa.out"); 

fprintf(ontiUe,  "*\tEigTalue\tF(dI  ValueXtXfOrdered  FoHXnXn"); 
Loopli(Bnm-eigvector8) 

{ 

^rintf(ootfUe,  "Xd)\tXf\tXf\t\tXl\n",  i,  eigvaluep],  fom[i],  fomjordeied.vector[i]); 

} 

fdose(oatiile); 


exit(O); 

/00000000000000000Find  ageubsses*******************************/ 

n  =  matrix(l,  nnmdtis,  1,  num^oeis); 

Loopli(num.coels) 

Looplj(namJti8) 
u|j][i]  =  0.0; 
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if  (type  ==  0)  /***Spe«Jcer***/ 

{ 

Loopli(num-coef8) 

Looplj(nain-ftt8) 

Looplk(numJtr8) 

a[j][i]  =  fomjordered.eigmatiixpc][i]  *  dat&_matrixp(][i]  +  u[j][i]; 

} 

else  if  (type  ==  1) 

{ 

Loopli(nam.coefB) 

Looplj(total.vector8) 

Looplk(num_ftr8) 

u[k][i]  =  (foinjordered.eiginatrix[j][i]  *  data^atrix[k]|j])  +  up(][i]; 

} 

/•***«««*«**  Write  file  containiag  list  of  eigenvector  names*********/ 

8priiitf(m8g,  "Xa.train.out",  typejiame); 
open-write(ontfile,  msg); 
sprintf(m8gl,  "eigenXa",  type-name); 

Loopli(num-eigvector8) 

{ 

8printf(file,  "XsXd.dat",  msgl,  i); 
fprintf(oatfile,  "XsXn",  file); 

} 

fclo8e(oatfile); 

/*******************  Write  out  eigenbasis  £]es***********************/ 

open-read(oatfile,  msg); 

Loopli(nam-coei8) 

{ 

fscanf(oatfile,  "Xs",  &;file); 
opeB-wiite(datfile,  file); 

Looplj(niun-fti8) 

fprintf(datfile,  "XgXn",  u|j][i]); 
fclo8e(datfile); 

} 

fclose(oiitfile); 


/tmm********************************************************************** 

Free  up  data  structure  memory. 

^**t*********************************************************************/ 

free-matrix(u,  1,  num-ftrs,  1,  nam.coeis); 

&ee-vector(fom,  1,  nam.eigcoefs); 
free-vector(acios8-cla88_vaiiance,  1,  numjclasses); 
iiee.vectoi(acro8s-class.mean,  1,  nnm-classes); 
free-vector(mean-vector,  1,  num.ftrs); 

&ee-vector(eigvalae,  1,  num.eigvectors); 
free.vector(mean.of_var,  1,  numJtis); 
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&ee.vector(v»rjoLmeaiu,  1,  nnmJtrs); 
free.vectoi(fomjordered.vectoi,  1,  numJtra); 
bee-vectot(fomjeiror,  1,  num-ftn); 

&ee.vectoi(caniJbmjeRoi,  1,  niunJtn); 
free.vector(eig^iror,  1,  nam-ftn); 

&ee.vectoi(ciunjngjeiioi,  1,  nnmJtn); 

lm^atrix(eigeiimatiix,  1,  nam.eigcoefo,  1,  nnmjeigvecton); 
beejnfttiix(fomjoidered.eigmatrix,  1,  nnm-eigcoeis,  1,  nnm-eigvecton); 
&eejmatrix(d»txjnatrix,  1,  num-ftn,  1,  totnl.vecton); 
free.mntrix(newdntnjaintrix,  1,  num-ftn,  1,  totnl-vecton); 
&eejnntrix(daM.mean,  1,  nnmjeigcoefii,  1,  num-clnases); 
freejnntnx(dam-vnriaace,  1,  numjeigcoefs,  1,  num-dsMes); 


} 


A.6.1  findJclspace.c. 


Progruu:  6ndJdsp*ce.c 


Deacriptioa:  This  roatine  ia  baaed  arotuid  the  Jacobi  rotation  routine 
found  in  Numerical  Recipes  in  C,  and  will  calculate  the  agenspace  associated 
mtb  a  set  of  data  vectors.  Given  the  vector  matrix,  it  will  Srst  calculate 
the  covarian<x  matrix,  and  then  the  eigenvectors.  The  eigenvector  matrix 
is  then  put  into  descending  eigenvalue  order  tnd  returned  to  the  calling 
program.  The  eigenvectors  may  also  be  printed  out  here  by  un-commenting 
the  appropriate  lines.  The  code  was  written  to  be  used  with  either  face 
or  speaker  data.  This  routine  was  written  to  be  used  by  the  program  fomjoutine. 

Author;  John  G.  Kdler 

Date:  1  Sep  93 


#indade  <stdio.h> 
#indnde  <math.h> 
#indnde  <8tring.h> 
#indnde  "jkucros.h" 


iind-kl8pace(chai  infilenameQ,  float  **eig-vecton,  float  *eig-Talu»,  float  •aveiage-temp,  float  **data-matrix, 
int  num^dasses,  int  vecton-peijdass,  int  num-ftn,  int  type) 

{ 

FILE  *datfile, 

*outfile, 

*aveiage, 

dnflle, 

*tiain. 
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*protofile; 

int  ij,  N,  k,  M, 
niot, 

nnm.traiii; 

float  **matiix(), 
*vector(), 

**data^atrix-tiaiis, 

««L, 

temp; 

void  &ee.vectoc(), 
freejnatrix(), 

jacolnO; 

char  flle[40], 
fileBame[40]; 


if  (type  =s  0) 

{ 

opeii-read(datiUe,  infileaame); 

) 

niun.tiain  =  nomojaMes  *  vecton^rxlaM; 


/»«««**  Aflocate  memory  «««««*/ 

datajuatiix-trana  =  matrix(l,nam.traiii,l|iinmJti8); 

if  (type  ==  1)  f»ce  veriflcatioo***/ 

{ 

L  =  matrix(l,  nnm-train,  1,  nam.train); 

} 

dse  if  (type  ==  0)  voice  verificatioo***/ 

{ 

L  =  matiix(l,  nnm-ftrs,  1,  namJtrs); 

} 

/>****«  fjutalixe  matrix  and  vectors  ***«*«/ 

if  (type  ==  1)  face  veriflcation***/ 

{ 

Loopli(nnm-train) 

{ 

Loopl  j  ( nnm  Jtrs) 

data^atrix-traBs[i][j]  =  datajnatrix[j][i]  =  0.0 

} 

open-read(train,  "lace.liat"); 

Loopli(nnm-cla8se8  *  vectors-perjclass) 
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{ 

fiKaBf(tr4ui,  "I*",  fiieaune); 
opeBJtead(piotofile,  filename) 

Looplj(nttmJtn) 

{ 

facna{(piotofile,  "Xl",  &dntvjnatrix[j][i]); 

} 

fclaae(protofile); 

} 

idoee(tnia); 

} 

elae  if  (type  ==  0)  /***!{  voice  verification***/ 

{ 

Loopli(num-tiain) 

{ 

Looplj(nnmJtn) 

data-matrix.trans[i][i]  =  data-matrix[i][i]  =  0.0; 

) 

Loopli(num.tiain) 

Looplj(nnm_ftn) 

iiKanf(datfile,  "Xl",  &datajnatrix[j]p]); 
fclane(datfile); 

} 

/***«***********Caiciiiate  average  vector*******************/ 

Loopli(nnmJtr8) 

{ 

temp  =  0.0; 
liOoplj(nam.train) 

{ 

temp  +=  data-matrixp][i]; 

} 

average.temp^]  =  temp/nnm.ttain; 

} 

/**************Sabtract  average  vector***********************/ 

Looplj(nnm.train) 

Loopli(nam^ti8) 

data-matrix[i][i]  =  data-matrix[i][j]  —  average.temp[i]; 


/»************MaJ[e  transpose  matrix************************/ 

Looplj(nnm_train) 

Loopli(nnm-ftr8) 

datajnatrix_transfj][i]  =  data^atrix[i][j]; 


/*******************Matrix  multiply  data_matrix  by  itself***********^***/ 
if  (type  ==  1)  face  verification***/ 


{ 

Loopli(iiiim-tiain) 

Looplj(iiiiiii-train) 

{ 

temp  =  0.0; 

Looplk(namJtn) 

temp  s  temp  +  datajmatnx.ttuu[i]pi]  *  deU-metzixpcJQ]; 
Lplp]  =  temp; 

} 

} 

else  if  (type  ==  0)  speelcer  veri&CAtion***/ 

{ 

Loopli(num-ftr8) 

Looplj(iiumJtis) 

{ 

temp  =  0.0; 

Looplk(nnm.traiii) 

temp  =  temp  +  d&t&_matrix-tian8[k][i]  •  deta-matrix[j][k]; 
Lp][j]  =  temp; 

} 

} 

&eejnstrix(datx-matrix-tisns,  1,  nam.tTain,  1,  numdtn); 

/*****«**«***«**Do  Jacobi  rotation  and  sort  eigeastaS***********/ 

if  (type  ==  1)  face  yeriScation***/ 

{ 

jaiCobi(L,  nnm.traiii,  eig.valnes,  eig.Tecton,  &niot); 
eigsrt(eig.valae8,  eig_vectot8,  nom-tiain); 

} 

else  if  (type  ==  0)  /***!{  voice  veriScation***/ 

{ 

jacobi(L,  nnm-ftrs,  eig.values,  eig.vectors,  &nrot); 
eigsrt(eig.valae8,  eig.vectors,  nam-ftrs); 

} 

if  (type  ==  1) 

&eejmatiix(L,  l,nam-tiaiii,l,nam-train); 
else  if  (type  ==  0) 

fieejnatiix(L,  l,iiiimJii8,l,namJtr8); 

/v**Can  print  out  eigenvectors  if  desired  by  uncommenting  here 

open-write(outSIe,  outSIename); 
fprintf(otttSle,  num^trs); 

Loopli(nnmJtrs) 

{ 

fprintf(ontSle,  average.temp[i]); 

} 

Loopli(numJtr8) 

{ 
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fprintI(outSIe,  ’’\iiJ8/\ii”,  eig.v*lues[i]); 
Looplj(num-ftr8) 

fyruttf(outSle,  ”%{  ",  eig.vectoTs[j][i]);  } 
fcloae(out£le); 

***Ead  of  print  eigenvector  section****/ 

&ee-vector(average.temp,  1,  numJtrs); 

} 


A.  6. 8  verify. f ace  jnet.c. 


Piogrnm:  verifyj’ace.net.c 


Description;  This  program  performs  face  recognition.  The  program  grabs  an  image  of  the  person  tilting 
in  front  of  the  camera,  processes  that  image,  extracts  the  KLTcoeHicients  and  finds  the  closest  match  from 
the  faces  in  the  training  set. 


Author:  Kenneth  Runyon 
Date;  8  July  92  -  31  Aug  92 


Modified  by:  John  G.  Keller 
Date;  1  Sep  93 

ModiScation  Description:  Added  capability  for  accepting  command  line  arguments.  Added  command 
line  option  for  specifying  use  of  the  net  built  for  the  nonlinear  transformation  for  testing  here. 

*****************0*0*************^*****^*^**********************^**/ 

#inclnde  <8t<lio.h> 

#inclade  "Tfc.lib.h" 

#inclnde  "jknacros.h" 

#include  "globala.h" 

#define  NUMJ.AYRS  2 
#define  WTJSED  1918940490 
#deiine  PARTJSED  1191645590 
#deiine  RNDM^ED  123456789 
#deiine  MAXJTS  600 

#define  OUTJNT  100 
#define  EH^JN  0.15 

#defuie  ETA.OUT  0.3 
#define  ETAU  J  0.0 

#define  ALPHA  0.5 

#define  BATJ5Z  1 

#define  TRAINJCT  0.0 
#define  NORM  1 


int  dimension,  j, 
nam.coefi3, 
nnm-train-faces. 
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done, 

nnm-protoe, 

happy. 

num^ass; 

FILE  *fparam, 
^{handle, 
*fweighto, 
*£Mt, 

*ftable; 

chai  wtJUe[10], 
datJile[10], 
hid-nodes, 
liidjiode82, 
aaername[30], 
iueil[30], 
an8wei[4], 
waate[2]; 

#define  TRUE  1 

#deiuie  FALSE  0 


extern  void  center(); 
extern  void  gwind(); 
extern  void  verjiet^oeiBcienta(); 

main(int  argc,  char  vargvQ) 

{ 

FILE  vvprob,  *fprob,  *nonlindata; 

char  conunand[30], 
n8er[30]; 

float  yesprob, 
yesontpnt, 
noontput, 
anm, 
yeavoice, 
novoice, 
yesvoiceprob, 
fnsedprob, 
fl.waste; 


int  i, 

PROCESS, 

NONLIN, 

arg^ount, 

USEJILE, 

type. 


num-Dot. person , 

SAME, 

int.waste; 


#(lefine  SYNTAX  "Usage:  Terily_lace_net  <claiaed  identity>\n 


[  ‘  f  llenaM '  ]  [noprocesa]  D 


if  (argc  ==  1) 

{ 

printf("\nXs\n",  SYNTAX); 
exit(O); 

} 

PROCESS  =  TRUE; 

NONLIN  =  FALSE; 

USE  JILE  =  FALSE; 

aig^imt  =  argc  —  1; 


if  (aigjconnt  ==  1) 

{ 

PROCESS  =  TRUE; 

NONLIN  =  FALSE; 

} 

if  (argjconnt  sss  2) 

{ 

if  (8trcinp(argv[2],  "noprocesa")  ==  0) 

{ 

PROCESS  =  FALSE; 

NONLIN  =  FALSE; 

} 

else  if  (strcmp(argv[2],  "nonlin")  ==  0) 

{ 

PROCESS  =  TRUE; 

NONLIN  =  TRUE; 

} 

else  if  (fopen(argv[2],  "r")  ==  NULL) 

{ 

piintf("\nCan't  open  the  file  Xs.\n",  argT[2]); 
exit(O); 

} 

else  USE-FILE  =  TRUE; 

} 

if  (argjconnt  ==  3) 

{ 

if  ((8trcmp(argv[2],  "noprocess")  ==  0)  ||  (8trcnip(argv[3],  "noprocess")  ==  0)) 

{ 

PROCESS  =  FALSE; 

} 

if  ((strcmp(argv[2},  "nonlin")  ==  0)  ||  (strcnip(atgv[3],  "nonlin")  ==  0)) 

{ 

NONLIN  =  TRUE; 
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} 

if  ((fopen(Mgv[2],  "r")  ==  NULL)  (fopen{argv[3],  "r")  ==  NULL)  &&  ((PROCESS  ==  TRUE) 
II  (NONLIN  ==  FALSE))) 

{ 

printf("\nCan't  open  either  Xs  or  Xa.  Kake  sure  you're  using  the  proper  syntax.  \nXs”, 
«gv[2],  argv[3],  SYNTAX); 
exit(O); 

} 

if  ((PROCESS  ==  FALSE  Lti  NONLIN  ==  FALSE)  1|  (PROCESS  ==  TRUE  NONLIN  == 
TRUE)) 

{ 

USEJILE  =  TRUE; 

} 

} 

if  (aig^ount  ==  4) 

{ 

PROCESS  =  FALSE; 

NONLIN  =  TRUE; 

USE-FILE  =  TRUE; 
if  (fopen(argv[2],  "r")  ==  NULL) 

{ 

printf("\nCan't  open  file  Xs.\n",  argv[2]); 
exit(O); 

) 

} 


8printf(nser,  "X8_f_klt.rt8",  argv[l]); 

if  ((ftable  =  fopen(a8er,  "r"))  ==  NULL) 

{ 

printf(”\nl  cem't  open  the  file  Xs_f_klt.vts.\n",  argv[l}); 
exit(O); 

} 

else 

fcIo8e(ftable); 


/»*********«*«******«*************>)'*«***«««*********** 

*  read  the  parameters  trom  train-params  file 

open_iead(fparam,  ''train_f_parans''); 
fscanf(fparam,"Xd",&dimension); 

£Bcanf(fparam,  "Xd"  ,&num  jcoeis) ; 
fscanf(fparam,  "Xd"  ,&num.train-face3) ; 
fscanf(fparam,"Xd",&num4>totos); 
iscaiif(fparam,"Xd",&num^lass); 
fclo8e(fparam); 

/*««;»********  Create  the  setup  file  for  the  neural  network  ♦**♦♦*♦♦♦♦/ 


A-49 


8priBtf(wtJUe,  argv[l]); 

strcpyfdat  Jile,  "klt_f .  dat "); 
hidjiodes  =  2  *  num^oeis; 
hidjiode82  =  0; 


feet  =  fopen("setup.alp","«"); 
fprintf 

(Cset,  "Xd\nXd\nXd\nXd\i)XB  -atore  »eightB\nXd\ii",NUMJjAYRS,WT^ED,PART-SED,RNDM_SED,wtJil< 

MAXJTS); 

fpriiitf(iset,"Xd  Xd  Xd  Xd\n",numxoefs,hid^odes,hid_nodes2,num.class); 
fprintf(iset,"XB  -data\nXd\nXl\i>Xf  \nXl\nXl\nXd\iiXl\nXd\n” , 

datJUe, OUTJNT,ETAJN,ETA.OUT,ETAJJ2, ALPHA, BAT^Z, TRAIN  J>CT, NORM); 
fclose(i8et); 

/ti***************************************^******************************^* 

Either  use  an  existing  image  file  or  grab  a  new  one.  Can  either  grab 
a  single  image  or  use  the  segmentation  algorithm. 

****^**^^m***************************************************************/ 


if  (fopen("user.gra",  "r")  #  NULL)  8y8tem("iB  user.gra”); 
happy  =  0; 

if  (USE-FILE  ==  TRUE) 

{ 

sprintf(command,  "cp  Xb  user.gra",  argv[2]); 

8y8tem(command); 

} 

else 

{ 

while  (!happy) 

{ 

antogiab("user"); 

spiintf (command,  "displayXd  user.gra  stay",  SM.WIDTH); 
system(command); 

printf("\nl8  the  picture  satisfactory  (y/n)?  "); 
get8(an8wer); 

if  ((answer[0]  ==  ’y’)  ||  (answer[0]  ==  ’Y’)) 
break; 

} 

} 

if  (PROCESS  ==  TRUE) 

{ 

center(dimen8ion, "  correlate .  ref  "."user.gra"); 
gwind(dimen8ion,"user  .gra"); 
centei(dimension, "  Bind .  ref  " ,  "user .  gra"); 

} 
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/*««*«*  create  t&e  data  fije  and  store  the  kl  coefficients**********/ 

open_write(ftable,  "waste"); 
opeii.write(fweight8,  "klt.l  .dat"); 

fpiintf(fweight8,"Xd\nXd\n",numxoef8,num.class); 

8printf(ttsei,  "user.gra"); 
type  =  1; 

if  (NONLIN  ==  FALSE) 

facejietu;oefficients(dimension,  num-coefs,  user,  {weights,  ftable,  num^lass,  type,  type); 
else 

facejietjcoefficient8(dimen8ion,  num.ttain Jaces,  user,  {weights,  {table,  nunudass,  type,  type); 
{cloee(fweights); 

if  (NONLIN  ==  TRUE) 

{ 

openj;ead({weights,  "klt_f.dat"); 
open.write(noiilindata,  "lacetest  .dat"); 

{print{(nonliiidata,  "%d\nl\iil\n",  num.ttainJaces); 

Loopli(3) 

i8can{({weight8,  "Xd",  &int. waste); 

Looplj(num.train-{ace8)  /***Nuin  of  features***/ 

{ 

f8can{( {weights,  "Xf  ",  icfl-waste); 

{print{(noiilindata,  "Xf  ",  fl.waste); 

} 

{close({weight8); 

{clo8e(nonlindata); 

8y8tem("xf eatures  facetest.dat"); 
open-read(nonlindata,  "newtest .  dat " ) ; 
open.wiite({weight8,  "klt_f.dat"); 
iscan{(nonlindata,  "Xd  ",  &int-waste); 

{print{({weiglits,  "2\n2\n0  "); 

Loopli(nain.coe{s) 

{ 

{Bcan{(nonlindata,  "Xf  ",  ^fl.waste); 

{print{({weight8,  "Xf  ",  fl.waste); 

} 

{print{{{weight8,  "0.90000  0.10000\n"); 

{close(nonlindata) ; 

{close({weights); 

} 


/******  And  the  best  matching  training  face  ******/ 

#i{de{  RESULTS 

system(  "slp.f  use.f  ile"  ) ; 

/******fuselist.c  has  the  other  file  writing  stuff.  The  actual  net 
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******«oa(puts  Are  wntteii  to  tie  £/e  'face^rob'  by  dkmain.c  com- 

aa  mlpJuaeJut*************************************/ 


#el8e 

8yBtein(  "alp.f  use  ” ) ; 

#endif 

/******If  not  writing  to  a  SJe,  output  to  the  screen***************/ 

#ifndef  RESULTS 
j  =  0; 

w}iile(argv[2][i]  ’\0’) 

{ 

if  (i8alpha(argv[2][i])) 

j++: 

else 

argv[2][j]  =  0; 

} 

if  (8trcmp(aigv[l],  argv[2])  ==  0) 

SAME  =  TRUE; 
else 

SAME  =  FALSE; 


open-read(fprob,  "node.out"); 

&canf(fprob,  "XfXl",  &:ye8outpat,  &nooutput); 
fclo8e(fpiob); 

num^ot-person  =  (nuin.train-faces/nam.pioto8)  —  1; 

yespiob  =  (yesoutput  *  (nani-not.per8on/2.0))/((ye8output  *  num^ot.peison/ZO)  +  nooutpnt  *  0.5); 
ptintf("\n\nThe  post-probability  based  on  face  that  this  is  Xs  is  Xf  .\n\n",  argv[l],  yespiob); 
if  (SAME  ==  TRUE) 

printf("\nClaiBed  ID  is  true  ID.  Verification  probability  is  Xf\n",  yespiob); 
else  if  (SAME  ==  FALSE) 

piintf("\nInposter.  Verification  probability  is  Xf\n",  yespiob); 


open-wiite(fpiob,  "faco_prob"); 
fpiintf(fpiob,  "Xf ",  yespiob); 
fclo8e(fpiob); 

#endif 

j*-^n***  remove  trash  files  *******/ 

/*system(’’rm  test.coe£s”); 

if  (fopen(”*.rle”,  ”r”)  /  NULL)  system(”rm  *.rle’’); 
if  (fopeB(’’*.red”,  ”r”)  /  NULL)  system(”rm  *.red”); 
if  (fopen(”*.rec’’,  ’’r”)  ^  NULL)  system(”rm  *.rec”);  */ 
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if  (fopeii("«ute",  "r")  ^  NULL)  sy8tem("rm  «aste"); 

} 

A.  6. 9  verify.voicejnet.c. 


Program:  veiify-voice-netc 


Description:  This  routine  is  based  on  verify Jace-net,  written  by  Ken  Runyon.  It  verifies  the  identity  of 
a  speaker  via  a  Sle,  and  can  be  modified  to  capture  a  speaker’s  voice  live.  It  also  maintains  the  capability 
of  concatenating  raw  speaker  cepstral  vectors  (to  retain  temporal  information)  or  using  codebook  vectors 
for  verification. 


Autlior;  John  G.  Keller 
Date:  I  Sep  93 

aa^asa:^***************************************************/ 

#iBclnde  <8tdio.h> 

#include  "rfc_lib.h" 

#inclade  "jkaacros.h" 

#indade  "globala.h" 

#define  NUMXAYRS  2 
#define  WT-SED  1918940490 
#define  PARTJSED  1191645590 
#define  RNDM^ED  123456789 
#defuie  MAXJTS  1000 

#define  OUTJNT  100 
#define  ETAJN  0.15 

#define  ETA.OUT  0.3 
#define  ETA  J  J  0.0 

#define  ALPHA  0.5 

#define  BATJSZ  1 

#define  TRAIN_PCT  0.0 
#define  NORM  1 

int  length,  j, 
num-coeis, 
num^peakei.vectots, 
done, 

nnm.pioto8, 

happy, 

numJnxhunk, 

nnm.chunks, 

nnm-class, 

total-num-coefs, 

nnmjiot.peison, 

temp; 

FILE  *fparam, 

^{handle. 
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*fweights, 

*£Ret, 

*ftable, 

^handle, 

*fdat, 

*fdat2; 

char  wtJUe[10], 
datJUe[10], 
hidjiodes, 
hidjiodesS, 
a8ername[30], 

U8eil[30], 

aii8wer[4], 

waste[2]; 

float  flniunber,  tempi,  temp2; 

#def.ne  TRUE  1 

#deiine  FALSE  0 


main(iiit  argc,  char  *argv[]) 

{ 

FILE  *vprob,  «prob,  *iionliiidata; 

char  co]iunand[30j, 

U8ei[30]; 

float  yespiob, 
yesoatpnt, 
noontput, 
fosedprob, 
fl.waste; 


int  i, 

USEJILE, 

SAME, 

NONLIN, 

int.waste; 

#define  SYNTAX  "Usage:  Terify_Toice_net  <claj 

if  (argc  ==  1  II  argc  >  4) 

{ 

priiitf("\iiX8\n",  SYNTAX); 
exit(O); 

} 

USE  JFILE  =  FALSE; 

if  (argc  ==  3  ||  argc  ==  4) 

USEJILE  =  TRUE; 


A- 


1 


identity>  ['filenue']  ['noiilin']\ji' 


if  (argc  ==  4) 

NONLIN  =  TRUE; 

spriiitf(a8ei,  "Xa_s_klt.«ts",  argv[l]); 
opeiuead(ftable,  user); 


Either  use  an  existing  speaker  £ie  or  grab  a  new  one. 


verifying  bom  Sle,  copy  Sle  to  ’user.speecb  ’  *««/ 


if  (USE  J'lLE  ==  TRUE) 

{ 

sprintf(command,  "cp  In  uaer. speech",  argv[2]); 
sy8tem(comman<l) ; 

} 

else 

{ 

f***inaert  apenket  capture  routines  here.  Put  resultant  speech  in 
user.speecb***/ 

} 


/***************************************************** 

*  rend  the  parameters  from  trainjs-pnrams  Sle 

******************************************************/ 

openjead(^>aram,  "train_8_paraa8"); 
i8canf(fparam,"Xd",  irnumjcoeis); 
f8canf(fpaiam,  "Xd"  ,&num  jclass); 
iscanf(fparam,  "Xd",  &numJnjchnnk); 
i8caiif(fparam,  "Xd",  frnumjtot.person); 
fclo8e(fparam); 

openjead(fhandle,  "user. speech"); 
fscaaf(iliandle,  "Xd",  &nnm-protoe); 

&canf(fkandle,"Xd",&length); 

rewind(fhandle); 

/******  create  the  data  Sle  and  store  the  Id  coelSdents**********/ 

open-write(fweights,  "klt.dat"); 

fprintf(fweigi.  .s,  "Xd\nXd\n"  ,nnmxoefs,num.class); 
sprintf(n8er,  "user. speech"); 
if  (NONLIN  ^  TRUE  ) 

voicejietxoefficients(length,  numxoeis,  fweigbts,  fliandle,  num.class,  nnm.protos); 
else 

voicejietxoefficients(length,  length,  fweights,  fhandle,  num.class,  num.protos); 
fclo8e(fweights) ; 

8ystem("n  user. speech"); 
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IV«  BOW  hMve  the  klt.det  Sle  contaudag  all  t^e  acta  of  hit  coehScieuts 
for  naer.  We  next  will  rewrite  tlie  Iclt.dat  file  to  repreaeBt 
concatenated  ’'chunha”  of  vectors.  A  cIiubJc  of  '1’  will  be  a  single  frame’s 
coeSdents. 


mtv********************************************************************/ 

nam-chnBlta  =  (int)  (num^rotos/immJnjchunk); 
totBljiiim.coeb  =  numJnjckank  *  num-coefa; 
open  jead(fweight8,  "  kit .  dat " ) ; 
open-write(liandle,  "teap.dat*'); 


£KaBf(fweight8,  "Xd",  fiftemp); 
fpriBtf(handle,  "XdXn",  total  Jiumjcoeis); 
iiBcaBf(fweighta,  "Xd”,  fetemp); 
fprintf(liandle,  "Xd\n",  temp); 


Loopli(nam.chnnk8) 

{ 

^rintf(haadle,  "Xd  ",  i  -  1); 
Looplj(totalj[inm^oel8) 

{ 

fiK;aaf(fweight8,  "Xi",  Itfinombei); 
{printf(handle,  "Xi  ",  flnnmbet); 

} 

fprintf(handle,  "0.10000  O.IOOOOW); 

} 

fdoae(han<lle); 

fdoae(fweight8),' 

systemC'BT  teq).dat  klt_a.dat"); 

if  (NONLIN  ==  TRUE) 

{ 

open-read(fweiglit8,  "klt_B.dat”); 
open-Wtite(nonlindata,  "apkrteat  .dat"); 
fprintf(nonlindata,  "Xd\nl\n64\n",  length  ); 
Loopli(3) 

&caBf(fweight8,  "Xd",  fieint-waate); 
Looplj(length)  /***Num  of  features***/ 

{ 

i8canf(fweight8,  "Xl  ",  &fl-wa8te); 
fyrintf(nonlindata,  "Xl  ",  fl-waste); 

} 

fdose(fweight8); 

fdo6e(nonlindata); 

8ystem("xfeatures  apkrteat. dat"); 
open-iead(nonlindata,  "nenteat.dat"); 
open.write(fweight8,  "klt_a.dat"); 
£K:anf(nonlindata,  "Xd  ",  &int.waste); 
fprintf(fweights,  "2\n2\n"); 
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Looi>lj(64) 

{ 

fpriatf(fweigltts,  "Xd  ",  j  -  1); 

Loopli(Bitiii-coe&) 

{ 

£>caiif(DO]ili]ui&t&,  ”Xl  ",  &fl.waste); 

^riiitf(fweight8,  "XI  ",  fl.waste); 

} 

fprintf({weights,  "0.90000  O.IOOOOW): 

} 

fdoae(Boitlind&ta); 
fdo6e(fweigktfl) ; 

) 

/******«*«***  CieMte  the  setup  Sle  for  the  aeursl  network  •**•***»**/ 


spriiitf(wtJile,  "X»_*_klt.»t*",  ugv[l]); 

•ticpy(dU-iUe,"klt_a  .dat"); 
hi<Ljiodes  —  2  *  totaljiani-coeis; 
liidjiodea2  =  0; 

£set  =  fopen("a«ttq>.Blp","«"); 

^rintf 

(feet,  "Xd\iiXd\nXd\nXd\iiX*  -atora  waight8\nXd\i»",NUMXAYRS,WTJSED,PART..SED,RNDM^ED,wtJU. 

MAXJTS); 

fpiiiitf(iiKt,"Xd  Xd  Xd  Xd\n",totaljiiimjcoeb,liidjnode8,lud-node82,iiiim-dass); 
fprintf(faet,"X8  -data\nXd\nXi\iiXf\nXf\iiXl\nXd\iiXf\nXd\ii", 

datjae,OUTJNT,ETAJN,ETAX)UT,ETA_lJJ,ALPHA,BAT.SZ,TRAINJ»CT,NORM); 

fdoee(&et); 

f******  Sad  the  best  mntcbing  training  speaker  **«**«/ 

i 

if  (fopen("noda_oat",  "r")  NULL)  systemC'n  noda.out"); 

#ifdef  RESULTS 

aystemC'alp.fuaa.lile"); 

f****s*{usdi8t.c  has  the  other  Sle  writing  stuff.  The  actual  net 
*»**s**outputs  are  written  to  the  Sle  ’face^rob’  by  dkmain.c  com- 
*******piled  as  iiifp..faseJist**«*«***«***«*******«*«******4i«**«***/ 

#d8e 

system(  "bI  p.fiiaa" ) ; 

#endif 

ftt*m***I{ not  writing  to  a  Sle,  output  to  the  screen**********^****/ 

#ilhdef  RESULTS 

i  =  0; 

while(argv[2][i]  /  '\0') 

{ 
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if  (i«lpha(«gv[2][j])) 

j++; 

dse 

«gT[2][j]  =  0; 

} 

if  (sticmp(4rgv[l],  argv[2])  ==  0) 

SAME  =  TRUE; 

else 

SAME  =  FALSE; 

opeii-iead(vprob,  “node.out"); 

8priiitf(command,  "Xs_prob.dat",  ugv[l]); 
if  (SAME  ss  TRUE)  opeii.write(prob,  command); 
tempi  =  temp2  =  0.0; 

Loopli(Bum.chunk8) 

{ 

£Kanf(vprob,  "Xflf ",  feyesontpat,  lenoontpnt); 

if  (SAME  ==  TRUE)  fprintf(ptob,  (yesontpnt  *  (Biimjiot4>erson/2.0))/((yesoutpnt  •  num-not.person/ZO) 

+  noontput  *  0.5)); 

tempi  +=  (  yesontpnt  *  (nnmjiot4>eT8on/2.0))/((yesontpnt  *  nnmjiot.pei8on/20)  +  noontpnt  *  0.5); 
temp2  +=  1.0  —  (  yesontpnt  *  (nnmjiot.person/Z0))/((yesontpnt  *  nnmjiot-pei8on/2.0)  +  noontpnt 
•  0.5); 

} 

fclose(vprob); 

if  (SAME  ==  TRUE)  fcloee(prob); 
yesptob  =  tempi /(tempi  +  temp2); 

piintf("\nTh«  post-probability  based  on  voice  that  this  is  Xs  is  Xf  •\n\n",  atgv[l],yesprob); 
open.write(vptob,  "Toice.prob"); 
fyiintf(yptob,  "Xf ",  yespiob); 
fdo8e(Tpiob); 

#endif 

/*«*«*«  remove  trash  fi/es  *«***««/ 
if  (fopen("«aste",  "r")  #  NULL)  sy8tem("m  waste"); 

} 

A.  6. 10  verify Jdentity.c. 

/v************************************************************************** 

Program:  verifyJdentity.c 

Description:  This  program  will  invoke,  in  turn,  the  face  verifier  and  the  speaker  verifier,  and  will  then 

fuse  the  results. 

Author:  John  G.  Keller 

Date:  1  Sep  93 

#inclnde<8tdio.h> 
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#inclade  "jkMcros.h" 

main(int  ugc,  chai  oaigvQ) 

{ 

int  i; 

float  faceptob, 
voiceprob, 
identpiob; 

chu  coiiunand[30]; 

FILE  «{handle; 

if  (argc  <  2  II  ugc  ==  3) 

{ 

printfC'XnSTITAX:  T«rif j.idantlty  <claiB«d  usaxiiaBa>  [<face  lileXToice  f ile>]\n\n"); 
exit(O); 

} 

if  (ugc  ==  2)  /**«Capture  face  and  voice  Uve***f 

{ 

8printf(command,  "verify.face.nat  Xa”,  ugv[l]); 

8yatem(command ) ; 

8pri]itf(command,  "TaTiiy_Toic«_n«t  Xa",  ugv[l]); 

8y8tein(comiDand); 

} 

elae 

{ 

/***********M»ke  sure  the  Sles  exist*********************/ 

openjead(fhandle,  ugv[2]); 
open-tead(fhandle,  ugv[3]); 

8printf(conunand,  "Tarify.lace.nat  Xa  Xa",  aigv[l],  ugv[2]); 

8y8tem(conunand) ; 

8printf(coiiunand,  "Terify_vaica_net  Xa  Xa",  ugv[l],  ugv[3]); 
ayatem  (command ) ; 

} 

/^m*************************************************************************** 

Get  the  face  and  voice  probabilities  resultant  from  running  the 
individual  veriiiers.  Fuse  those  probabilities. 

*^0^t***m*********************************************************************/ 

openjead(fhandle,  "lace.prob"); 

£scanf(fhandle,  "Xf",  &faceprob); 
fdo8e(fhandle); 

openjead(fhandle,  "voice.prob"); 
i8canf(fhandle,  "Xf ",  &voiceprob); 
fclo8e(fhandle); 

open.wiite(fhandle,  "ident.prob"); 
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for  (i  =  0;i  <  10;i++) 

{ 

identprob  =:  0.1  *  i  «  faceprob  (1  —  (0.1  *  i))  «  voicepiob; 

^iinlf(ihuidle,  "Xf  Xf\n\tXl\ii",(0.1  ♦  i),  1  —  (0.1  •  i),  identprob); 

printf('*Probnbility  of  Torillod  identity  based  on  XdXX  lace  and  XdXX  voice  >  Xl\n",  10  *  i, 
10  •  (10  —  i),  identprob); 

} 

fclo6e({handle); 

} 


A. 6.11  xnetpush.c. 

Program:  xnetpush.c 

Description:  This  program  performs  a  non-linear  transformation  on  multiple  classes  of  input  vectors. 

The  weight  update  rule  causes  the  transformed  classes  to  be  pushed  apart  from  each  other,  with  tbe  intent 
of  moving  tbe  classes  into  a  reduced  and  more  separable  space  than  tbe  one  in  which  they  began.  Tbe 
weights  are  saved  to  a  Sle,  as  are  the  final  outputs  from  tbe  net.  These  outputs  are  also  put  into  multiple 
Sles  for  plotting  by  Gnuplot. 

Author:  John  G.  Keller 

Date:  IS  Oct  93 

#indade<8tdio.h> 

#include<8tdlib.h> 

#inclttde<8tTing.h> 

#inclade<math.h> 

#inclade  "jknacros.h" 

#deiine  epoch  10 

/******* Declare  global  variables  and  functions************/ 

int  num.prototypes, 
nam.da88e8, 
numJeatnres, 
num.oat^odes, 
numJiidden-nodes, 
outpnt.type; 

float  **data-matrix, 

**weightl2, 

**weight23, 
vtemp  Jhidden  jont  1 , 

*tempJudden^nt2, 

vtemp^utl, 

*temp.out2; 


float  *vector(),  **matrix(),  free.vector(),  fteejnatrix(),  ranl(); 
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/***************Beffn  maiji  program****************/ 


main(iiit  argc,  char  *argv[]) 

{ 

int  i  j,k,  l,m,  n, 
maxJterations, 
temp, 

landom-pickl, 

iaiidom-pick2, 

tandom-seed, 

EXIT^WITCH, 

OUT.OF.CLASS, 

max-vector8[5], 

claasl, 

clas82; 


float 

**new.datajnatrix, 
*hidden-node-out, 
*delta^at, 
*deltaJudden, 
*temp.totaljdut, 
*8am- weight  joat, 
*meaBjeatiire, 
*vaiianceJeataie, 
*deItajoatl, 
*deltajoat2, 
*difi'.oat, 
*sam.oatjdeltal, 
*sum.o«tjdelta2, 
*deltaJbiddeii-outl , 
*deltaJiidden-oat2, 
etaJn, 
etajoct, 
tempi, 
temp-holdl, 
tempJiold2, 
tempJiold, 
sum.diiF^ut, 
normJactoi, 
total  jerroi, 
min-distance, 
scaleJactoT, 
min-dist, 

**desiied, 

*snm.difF, 

sum-weight; 

char  command[30]; 
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FILE  «netiiifo,  *data,  *waghtiUe,  *fliandle,  «erroi^ut,  *errorJii,  *newte8t; 


if  (argc  /  2) 

{ 

priiitf("\nSYITAX:  trainznetbatcli  <dataiile  naBe>\n\ii"); 
exit(O); 

} 

opeiuead(netiiifo,  "bpnat.dat"); 

&canf(netinfo,  "Xd\nXd\nlM\iiXf\nXd\nXd\nXd",&nnmjoutjiode8,  &numJuddenjiode8,  &etaJn,  &etajout, 

&maxJteiation8,  &iandonijBe«d,  &ontpat_type); 

fclo8e(netinfo); 


landomjseed  =  —  random.seed; 
open-iead(data,  aigv[l]); 

iiBcanf(data,  "%d\nXd\nXd\ii",  fenamJeatures,  ^num^lasses,  &nam.prototypes); 
num  Jiiddeii^ode8++ ; 

nnmJeatuies  +=  1;  l***Account  for  augmeiitatioii***/ 


Declare  and  initialise  matrices  and  vectors. 

*«***««**««******«*«*«***<****«*****««*****««•«««««««***********************/ 


hidden^ode^nt  s  vector(l,  numJuddenjDodes); 
sam.weightjoat  s  vector(l,  namJudden-nodes); 
temp.totaljdist  =  vectoi(l,  nain.ontJiode8); 
meanJeatare  —  vector(l,  nnmJeatures); 
varianceJeatnie  =  vectoi(l,  nomJeatares); 
deltajoutl  =  vectoi(l,  num^nt^odes); 
delta^at2  =  vectot(l,  nnm^utjiodes); 
diffjout  =  yectoi(l,  num^utjiodes); 
sam^nt.deltal  =  vector(l,  numJiidden-nodes); 

8iiinjout^elta2  =  vector(l,  namJiidden-nodes); 
deltaJiidden.ontl  =  vectot(l,  numJiiddenuiodes); 
deItaJiidden.oat2  =  vector(l,  numJiiddenjiodes); 

weiglitl2  =  matrix(l,  numJtidden-oodes,  1,  numJeatures); 

weight23  =  matrix(l,  num.outjiodes,  1,  numJiidden-iiodes); 

data-matiix  =  matrix(l,  numJeatures,  1,  (num-prototypes  *  nnm-classes)); 

tempjoutl  =  vector(l,  num-out-noJes); 

tempjout2  =  vector(l,  nnm-unt_node8); 

tempJiidden-outl  =  vector(l,  nnmJudden^odes); 

tempJiidden-out2  =  vector(l,  numJiidden-nodes); 


/^l^*^^^l^l^l******************************************************************* 

Load  all  the  data  into  a  single  matrix.  Will  be  able  to  extract  specihc 
class  vectors  later  by  keeping  track  of  the  indices. 
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Loopli(Bam.piototype8  *  num-classes) 

{ 

Looplj(attmJ'eatuie8  —  1) 

&canf(data,  "If",  &dat&^atiix[j][i]); 

dat2L-matTix[numJeature8][i]  =  1.0;/*** Augment  each  vector  with  1. 0***1 

} 


Normalue  all  the  data  across  the  features. 

***************************************************************************/ 


normalize(data-matiix,  meanJeature,  vaiianceJeature); 

/»** 

printfC’raitdom.seed  =  98d\n”,  randomseed); 

***! 


Irutialize  net  weights,  ’rani  ’  is  a  Numerical  Recipes  routine  that  returns 
a  random  Boat  between  0.0  and  1.0. 

***************************************************************************/ 


Looplij(numJudden-node8,  numJfeatures) 

{ 

weightl2[i][j]  =  1.0  «  (ranl(&:random.8eed)  —  0.5); 

} 

Looplij(num-outjiode8,  namJiiddeii-nodes) 

{ 

weight23[i][j]  =  1.0  *  (taiil(&iandom-seed)  —  0.5); 

} 


/*********open  Sle  for  writing  output  error**********/ 
open-write(erroiJn,  "error .  inclass"); 
open-wiite(eiioijoat,  "error . outclass"); 


/*************************************************************************** 

This  is  the  start  of  the  main  loop.  Loop  until  we  exceed  m  iterations. 

***************************************************************************/ 

m  =  0; 

while  (m  <  max  jteiations) 

{ 

/*************************************************************************** 

Randomly  pick  two  vectors  hom  the  data  set  and  determine  to  which 

classes  they  belong.  If  they  are  in  the  same  class,  we  wish  to  push 

the  outputs  together;  if  they’re  in  different  classes,  we  wish  to 

push  the  outputs  apart.  Set  the  variable  ’’OUT.OF^CLASS”  to  indicate  same 

or  different  classes. 

***************************************************************************/ 
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random.pickl  =  (int)(ranl(&iandoinjsee<i)  *  num-prototypes  •  num.classes); 

while  (landom.pickl  ==  0)  random.pickl  =  (int)(ranl(Si:random-seed)  *  num.prototypes  ♦  num.classes); 

classl  =  random.pickl /num.prototypes  +  1; 
label: 

iandom.pick2  =  (int)(ranl(4£random.seed)  ♦  num.prototypes  *  num.classes); 

while  (random.pick2  ==  0)  random.pick2  =  (int)(ranl(&random.8eed)  ♦  num.prototypes  »  num.classes); 

class2  =  random.pick2/num.prototype8  -I-  1; 

if  (m  <  maxjiterations/4  ||  m  >  3  ♦  maxJterations/4) 

{ 

if  (classl  ==  class2)  goto  label; 

} 

else 

if  (classl  class2)  goto  label; 

l**********Compaie  the  two  classes  and  set  the  Rag***************/ 

if  (classl  ==  class2)  OUT.OF.CLASS  =  0; 
else  OUT.OF.CLASS  =  1; 


Loop  once  through  net  for  each  vector.  Save  the  outputs  of  each  node 
for  later  calculation  of  the  new  weights. 

,^:t^^^1tt********************************************************************/ 

/***Fir8t  compute  the  hidden  layer  outputs  for  each  vector***/ 

compateJiidden.node8(data.matrix,  weightl2,  numJiiddenjiodes,  random.pickl,  tempJudden.outl); 
computeJuddenjiodes(data.matrix,  weightl2,  numJiiddenjiodes,  random.pick2,  tempJuddenjout2); 


/**********Jfow  compute  the  output  nodes**************/ 

compntejoutput jiodes(weight23,  tempJiidden.outl,  temp.outl,  numa>utJiodes,  numJiiddenjiodes, 
output.type); 

compute.output.nodes(weight23,  tempJudden.out2,  temp.out2,  numjoutjiodes,  numJiidden.nodes, 
output.type); 

tempi  =  0.0; 

Loopli(num.outJiode8) 

tempi  +=  sqr(temp-outl[i]  -  temp.out2[i]); 

if  (OUT.OF.CLASS  ==  1)  fprintf(errorj3ut,  sqrt(templ)); 

else  if  (OUT.OF.CLASS  ==  0)  fprintf(error Jn,  "Xf\n",  sqrt( tempi)); 
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/*******«*•************  Update  the  weights***************************/ 


/********Build  some  terms  for  use  by  the  update  title***********/ 


Loopli(nuin^ut^ode8) 

{ 

deltajoutl[i]  =  temp^utl[i]  *  (  1  —  tempjoutl[i]); 
delta^at2p]  =  temp^ttt2[i]  *  (  1  -  tempjaat2[i]); 
diir^at[i]  =  temp^ntlfi]  —  tempjQut2[i]; 

} 

Loopli(namJuddeii -nodes) 

{ 

sum.0atjdeltal[i]  =  8amjontjdelta2[i]  =  0.0; 

Looplj(nain-out-node8) 

{ 

8amjontjdeltal[i]  +=  difFjout[j]  *  weight23|j][i]  *  deltajoutljj]; 

8amjontjdelta2[i]  +=  diiF.xiut[ij  *  weight23[i][i]  *  delt3L-out2[j]; 
deltaJiiddenjoutl[i]  =  temp.hidden.outl[i]  «  (  1  —  temp-]iidden.outl[i]); 
deltaJiiddenjout2[i]  =  temp-hidden.oat2[i]  *  (  1  —  temp-liidden.ont2[i]); 

} 

} 

/***********Fitst  update  the  output  layer  weights**********/ 

Loopli(numJiidden-node8) 

Looplj(nain-oat-node8) 

{ 

if  (OUT.OF.CLASS  ==  1) 

{ 

weight23[j][i]  +=  etajout  ♦  diffjoat[j]  *(delta.outl[j]  ♦  tempJiidden.oatl[i]  —  deltajout2[j]  * 
temp.liidden-out2[i]); 

} 

else  if  (OUT.OF.CLASS  ==  0) 

{ 

weight23[j][i]  — =  etajout  •  diff.out|j]  *(delta.outl[j]  •  tempJiidden-outl[i]  —  deltajout2|j]  ♦ 
tempJiidden.out2[i]); 

} 

} 

/***********Then  update  the  hidden  layer  weights***************/ 

Loopli(num.featuies) 

Loopl  j  (nnm  Judden-nodes) 

{ 

if  (OUT.OF.CLASS  ==  1) 

{ 
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weightl2^][i]  4-=  eta-in  *  sum.out.ideltal[i]  *  delta.liidden.outl[j]  *  data.matrix[i][random.pickl] 

—  8umjoatjdelta2[j]  *  delta-kidden-out2[j]  *  datajnatiix[i][random.pick2]; 

} 

else  if  (OUT.OF.CLASS  ==  0) 

{ 

weightl2[j]p]  — =  etaJn  «  8um.otttjdeltal|j]  *  delta-]udden.outl[j]  *  data.matrix[i][randoin.pickl] 

—  sumjout.delta2[)]  *  deltaJudden.out2[j]  *  datajnatrix[i][random.pick2]; 

} 

} 

} 

m++; 

if  (m  >  maxJterations)  break; 

} 

fclo8e(eirorJn); 
fclose(erroi.out ) ; 

Save  the  weights  to  a  £le. 

*********************************** v************«**«**«**********v*********/ 

open.write(newte8t,  "nentest.dat"); 

Looplk(num-prototype8  *  nnmjclasses) 

{ 

/»***«8****Compate  the  output  of  the  hidden  nodes************/ 

compateJudden.nodes(data.matiix,  weiglitl2,  numJhidden-nodes,  k,  tempJbddden-outl); 

/it**t******Now  compute  the  output  nodes**************/ 

compute.output-node8(weight23,  temp-hidden jontl,  tempjoutl,  num-out-nodes,  num-hidden-nodes, 
output-type); 

Loopli(num-out-nodes) 

{ 

fprintf(newte8t,  "Xf  ",  tempjoutlpJ); 

} 

fprintf(newte8t,  "\n"); 

} 

fclo8e(newte8t); 

open-write(weightfile,  "bpnet . wts"); 

fprintf(weightiile,  "Xd\nXd\nXd\nXd\n",  num-features,  num-hidden-nodes,  num-OUt-nodes,  output-type); 
Loopli(num-features  —  1) 

fprintf(weightfile,  "Xf  \nXf  \n",  meanJeatuie[i],  variance-feature[i]); 

Loopli(nur'  Jiidden-nodes) 

{ 

Looplj(num-features) 
fprintf(weightiile,  "Xf  ",  weightl2[i][j]); 
fpiintf(weightfile,  "\n"); 

} 

Loopli(num-out-nodes) 

{ 

Looplj(num-hidden-node8) 


A-66 


fprintf(weightiUe,  "Xl  ",  weight23[i][j]); 

{print{(weightfile,  "\n"); 

} 

fcloee(  weightfile) ; 

printf("\nTotal  epochB :  Xd\n\n",  m); 

sprintf(command,  "xleatures  Xs",  argv[l]); 
/*t*8ystem(coinmaJid);***/ 

sptiiitf(coinmaiid,  "gaudata  newteat.dat  Xd",  num^lasses); 
sy8teni(command); 


Free  memory  bom  matrices  »nd  •-  xtors. 

**4ii»«*««********«*****«****«*************«**********************««**«««****/ 

ftee-vector(hidden-node_ont,  1,  namJudden-nodes); 

&ee.vectoi(8am_weightjout,  1,  numJiiddenjtodes); 
fiee.vector(temp.totaljdi8t,  1,  nain.out^ode8); 
free.vector(meanJeature,  1,  nam_featuie8); 

&ee.vector(varianceJeatnre,  1,  nninJeatiire8); 
fre«.vector(deltajoatl,  1,  numjoutjiodes); 

&e«-vectoi(deltajoat2,  1,  nanuoatjiodea); 
ftee.vector(deltaJuddeii.oatl,  1,  numJudden^odea); 
free.vector(deltaJuddeii-oat2,  1,  nttinJiidden-nodes); 

&ee.vector(difFjoat,  1,  numjoutjuodes); 
free-vector(8amjoat^eltal,  1,  numJiidden-nodea); 

{ree-vector(8amjoat.delta2,  1,  numJiiddeii-nodes); 

&eejnatrix(weightl2,  1,  numJuddenjiode8,  1,  numJeatures); 
j[reejnatrix(weiglit23,  1,  nuiiijoat_nodes,  1,  numJuddenjiodea); 
&ee-matrix(data_inatrix,  1,  (num.piototype8  *  num.classes),  1,  numJeatures); 
freejnatrix(tempjoutl,  1,  num^ut-nodes); 
iiee_matrix(tempjout2,  1,  numjout-nodes); 

&eejnatrix(tempJiidden-outl,  1,  numJiidden^odes); 

£ree_matrix(tempJiidden-out2,  1,  numJiidden-nodes); 

} 


normalize(float  **datajmatrix,  float  «mean Jeature,  float  ^varianceJeature) 

{ 

int  ij; 

float  8nm, 

*8umJeature8, 

*sum.vai; 

FILE  ^fhandle; 

sumJeaturea  =  vectoT(l,  numJeatures); 
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Bum-vai  =  vectoi(l,  nomJeatuies); 


Loopli(namJeatuie8  —  1) 

{ 

sum  Jeatare8[i]  =  0.0; 

Looplj(num.prototype8  «  num^classes) 

{ 

8ainJeatuie8[i]  +=  data-matrix[i][j]; 

} 

} 

Loopli(namJeatures  —  1) 

mean  Jeatuie[i]  =  sumJeature8[i]/(nam-prototypes  *  num^lasses); 

Loopli(num-featnies  —  1) 

{ 

snm.vai[i]  =  0.0; 

Looplj(num.prototype8  *  num-classes) 

{ 

8um-var[i]  +=8qr(datajnatrix[i][i]  —  meanJeature[i]); 

} 

} 

Loopli(nam-feattiTe8  —  1) 

varianceJeature[i]  =  sum.var[i]/(nam-prototype8  *  num^jclasses); 

Loopli(nnmJeatare8  —  1) 

Looplj(nnm-prototype8  «  nam.x;las8e8) 

data-matrix[i][)]  =  (data-matrix[i][j]  —  mean-featare[i])/varianceJFeature[i]; 
open-wiite(fhandle,  "noralile.dat"); 

Loopli(nam.prototype8  *  num-dasses) 

{ 

Looplj(namJeatare8  —  1) 
fprintf({handle,  "XI  ",  datajnatrix[j][i]); 
fprintf(fhandle,  "\n"); 

} 

fdo8e(fhandle); 

fiee.vectoi(sQm-featutes,  1,  numJeatures); 
free-vectot(8um-var,  1,  numJeatures); 

} 


computeJiidden  Jiodes(float  **data-matrix,  float  **weightl2,  int  num  Jiidden-nodes,  int  vector,  float  *tempJiidden.out) 

{ 

int  ij; 

float  tempJioldl; 

Loopli(numJiiddenjiodes  —  1) 

{ 

tempJioldl  =  0.0; 
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Looplj(nttmJ'eatare8) 

{ 

tempJioldl  +=  weightl2[i][i]  *  data-niathx[i][vector]; 

} 

tempJhidden^ut[i]  =  1.0/(1.0  +  exp(— temp-holdl)); 

} 

tempJiiddeiijout[nttmJuddeii-node8]  =  Account  for  augmentation***/ 

} 


compate-output-node8(float  **weight23,  float  *tempJudden^ut,  float  *tempjout,  int  numjoutjiodes,  int 
numJiiddenjiodes,  int  outpnt.type) 

{ 

int  ij; 

float  tempJioldl; 


Loopli(nnm-oat-node8) 

{ 

tempJioldl  s  0.0; 

Looplj(namJiiddenjiode8  —  1) 

{ 

tempJioldl  +:=  weight23[i]^]  «  tempJudden.ont[j]; 

} 

if  (output-type  ==  0)  /***!/ linear  output***/ 

{ 

temp.out[i]  =  temp-holdl; 

} 

else  if  (output.type  ==  1)  /***//  nonlinear***/ 

{ 

tempjont[i]  =  1.0/(1.0  +  exp(— temp-holdl)); 

} 

} 

} 
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Appendix  B.  Source  Data 

This  appendix  contains  the  source  data  for  the  performance  figures  given  in  Chapter 
4.  Tables  B.l  through  B.8  give  the  data  for  face  verification  accuracy,  and  Tables  B.9 
through  B.16  provide  the  data  for  speaker  verification. 


Table  B.l  Face  verification  accuracy  when  the  subject’s  claimed  identity  is  his  true  iden¬ 


tity  (two  dimensions). 


Claimed  Identity  is  the  True  Identity  (True  Accept  Accuracy)  | 

Claimed  ID 

Eigenvalue 

FoM 

cmartin 

100.0  % 

100.0  % 

100.0  % 

dprescot 

100.0% 

20.0% 

eingham 

100.0  % 

100.0  % 

80.0  % 

jcossent 

100.0% 

100.0  % 

80.0% 

jkeller 

100.0  % 

100.0  % 

100.0  % 

jmiller 

100.0  % 

100.0  % 

100.0  % 

Table  B.2  Face  verification  accuracy  when  the  subject’s  claimed  identity  is  not  his  true 


identity  (two  dimensions). 


Claimed  Identity  is  Not  True  Identity(True  Reject  Accuracy) 

Claimed  ID 

Minjerror 

Eigenvalue 

FoM 

cmartin 

100.0  % 

68.0  % 

20.0  % 

dprescot 

84.0  % 

100.0  % 

40.0  % 

eingham 

96.0% 

80.0  % 

64.0  % 

jcossent 

60.0  % 

88.0% 

100.0  % 

jkeller 

20.0  % 

68.0  % 

16.0  % 

jmiller 

96.0  % 

60.0  % 

20.0  % 
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Table  B.3  Face  verification  accuracy  when  the  subject’s  claimed  identity  is  his  true  iden¬ 


tity  (four  dimensions). 


Claimed  Identity  is  the  lYue  Identity  (Tirue  Accept  Accuracy)  | 

Claimed  ID 

Min-error 

Eigenvalue 

FoM 

cmartin 

100.0  % 

100.0  % 

dprescot 

100.0% 

100.0  % 

80.0% 

eingham 

100.0  % 

100.0  % 

80.0  % 

jcossent 

100.0% 

100.0  % 

40.0% 

jkeller 

100.0  % 

100.0  % 

100.0  % 

jmiller 

100.0  % 

100.0  % 

100.0  % 

Table  B.4  Face  verification  accuracy  when  the  subject’s  claimed  identity  is  not  his  true 


identity  (four  dimensions). 


Claimed  Identity  is  Not  True  Identity  (True  Reject  Accuracy 

Claimed  ID 

Minjerror 

Eigenvalue 

FoM 

cmartin 

100.0  % 

92.0% 

8.0  % 

dprescot 

100.0  % 

100.0  % 

64.0  % 

eingham 

100.0  % 

80.0  % 

72.0  % 

jcossent 

96.0  % 

100.0  % 

96.0  % 

jkeller 

64.0  % 

80.0% 

48.0  % 

jmiller 

88.0  % 

60.0  % 

8.0  % 
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Table  B.5  Face  verification  accuracy  when  the  subject’s  claimed  identity  is  his  true  iden¬ 


tity  (six  dimensions). 


Claimed  Identity  is  the  TVue  Identity  (TVue  Accept  Accuracy)  | 

Claimed  ID 

Min^error 

Eigenvalue 

FoM 

cmartin 

100.0% 

100.0  % 

80.0  % 

dprescot 

100.0% 

100.0  % 

100.0  % 

eingham 

100.0% 

100.0% 

80.0  % 

100.0  % 

100.0  % 

100.0  % 

jkeller 

100.0% 

100.0  % 

0.0  % 

jmiller 

100.0  % 

100.0  % 

100.0  % 

Table  B.6  Face  verification  accuracy  when  the  subject’s  claimed  identity  is  not  his  true 


identity  (six  dimensions). 


Claimed  Identity  is  Not  True  Identity  (True  Reject  Accuracy)  ( 

Claimed  ID 

Minjerror 

Eigenvalue 

FoM 

cmartin 

100.0  % 

100.0  % 

80.0  % 

dprescot 

100.0  % 

100.0  % 

80.0  % 

eingham 

100.0  % 

100.0  % 

16.0  % 

jcossent 

100.0  % 

100.0  % 

60.0  % 

jkeller 

80.0  % 

88.0% 

72.0  % 

84.0  % 

88.0  % 

88.0  % 
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Table  B.7  Face  verification  accuracy  when  the  subject’s  claimed  identity  is  his  true  iden¬ 


tity  weight  dimensions). 


Claimed  Identity  is  the  IVue  Identity  (True  Accept  Accuracy)  | 

Claimed  ID 

Minjerror 

Eigenvalue 

FoM 

cmartin 

100.0  % 

100.0  % 

100.0  % 

dprescot 

100.0  % 

100.0  % 

100.0  % 

eingham 

100.0  % 

100.0  % 

80.0  % 

jcossent 

100.0  % 

100.0  % 

40.0  % 

jkeller 

100.0% 

100.0  % 

100.0  % 

jmiller 

100.0  % 

100.0  % 

100.0  % 

Table  B.8  Face  verification  accuracy  when  the  subject’s  claimed  identity  is  not  his  true 


identity  (eight  dimensions). 


Claimed  Identity  is  Not  True  Identity  (True  Reject  Accuracy)  | 

Claimed  ID 

Min-error 

Eigenvalue 

FoM 

cmartin 

100.0  % 

100.0  % 

28.0  % 

dprescot 

96.0  % 

100.0  % 

88.0  % 

eingham 

96.0  % 

96.0% 

44.0  % 

jcossent 

100.0  % 

96.0  % 

72.0  % 

jkeller 

88.0  % 

84.0  % 

44.0  % 

jmiller 

100.0  % 

100.0  % 

88.0  % 
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Table  B.9  Speaker  verification  accuracy  when  the  subject’s  claimed  identity  is  his  true 


identity  (four  dimensions). 


Claimed  Identity  is  the  True  Identity  (True  Accept  Accuracy)  | 

Claimed  ID 

Min-error 

Eigenvalue 

FoM 

cmartin 

25.0  % 

50.0  % 

0.0  % 

dprescot 

100.0  % 

100.0  % 

100.0  % 

jkeller 

0.0% 

0.0  % 

jmiller 

100.0  % 

0.0  % 

jtreleav 

100.0  % 

kmccrae 

50.0  % 

100.0  % 

mchin 

75.0  % 

25.0% 

25.0  % 

rmacdona 

0.0% 

0.0% 

100.0  % 

wgool 

0.0% 

100.0  % 

100.0  % 

Table  B.IO  Speaker  verification  accuracy  when  the  subject’s  claimed  identity  is  not  his 


true  identity  (four  dimensions). 


Claimed  Identity  is  Not  True  Identity  (True  Reject  Accuracy 

Claimed  ID 

Min-error 

Eigenvalue 

FoM 

cmartin 

59.4  % 

78.1  % 

dprescot 

46.9  % 

84.4  % 

81.3  % 

jkeller 

93.8  % 

100.0  % 

jmiller 

31.3  % 

53.1  % 

jtreleav 

84.4  % 

25.0  % 

28.1  % 

kmccrae 

34.4  % 

71.9  % 

56.3  % 

mchin 

40.6  % 

87.5  % 

78.1  % 

rmacdona 

68.8  % 

81.3  % 

37.5  % 

wgool 

62.5  % 

25.0  % 

28.1  % 

Table  B.ll  Speaker  verification  accuracy  when  the  subject’s  claimed  identity  is  his  true 


identity  (six  dimensions). 


Claimed  Identity  is  the  True  Identity  (True  Accept  Accuracy) 

Claimed  ID 

Min-error 

Eigenvalue 

FoM 

cmartin 

50.0  % 

0.0% 

0.0% 

dprescot 

100.0  % 

100.0  % 

100.0  % 

jkeller 

100.0  % 

75.0  % 

Jmiller 

100.0  % 

0.0  % 

jtreleav 

100.0  % 

100.0  % 

100.0  % 

kmccrae 

100.0  % 

100.0  % 

mchin 

50.0% 

0.0% 

25.0  % 

rmacdona 

0.0  % 

75.0% 

50.0  % 

wgool 

100.0  % 

0.0  % 

100.0  % 

Table  B.12  Speaker  verification  accuracy  when  the  subject’s  claimed  identity  is  not  his 


true  identity  (six  dimensions). 


Claimed  Identity  is  Not  True  Identity  (True  Reject  Accuracy)  | 

Claimed  ID 

Min-error 

Eigenvalue 

FoM 

cmartin 

87.5  % 

87.5  % 

dprescot 

46.9  % 

87.5  % 

jkeller 

50.0  % 

87.5  % 

jmiller 

21.9  % 

59.4  % 

28.1  % 

jtreleav 

43.8  % 

31.3% 

15.6  % 

kmccrae 

81.3  % 

37.5% 

65.6  % 

mchin 

96.9  % 

93.8  % 

68.8  % 

rmacdona 

84.4  % 

37.5  % 

50.0  % 

wgool 

40.6  % 

31.3  % 
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Table  B.13  Speaker  verification  accuracy  when  the  subject’s  claimed  identity  is  his  true 


identity  (eight  dimensions). 


Claimed  Identity  is  the  True  Identity  (True  Accept  Accuracy)  | 

Claimed  ID 

Min-error 

Eigenvalue 

FoM 

cnuurtin 

50.0  % 

75.0  % 

0.0  % 

dprescot 

75.0  % 

100.0  % 

100.0  % 

jkeller 

100.0  % 

50.0  % 

50.0  % 

jmiller 

100.0  % 

25.0  % 

jtreleav 

100.0  % 

100.0  % 

100.0  % 

kmccrae 

100.0  % 

0.0  % 

100.0  % 

mchin 

75.0% 

0.0% 

25.0  % 

rmacdona 

0.0% 

0.0  % 

100.0  % 

wgool 

100.0  % 

100.0  % 

100.0  % 

Table  B.14  Speaker  verification  accuracy  when  the  subject’s  claimed  identity  is  not  his 


true  identity  (eight  dimensions). 


Claimed  Identity  is  Not  True  Identity  (l>ue  Reject  Accuracy)  | 

Claimed  ID 

Min.error 

Eigenvalue 

FoM 

cmartin 

78.1  % 

78.1  % 

68.8  % 

dprescot 

34.4  % 

81.3  % 

78.1  % 

jkeller 

53.1  % 

93.8  % 

93.8  % 

jmiller 

40.6  % 

12.5  % 

28.1  % 

jtreleav 

28.1  % 

28.1  % 

25.0  % 

75.0  % 

65.6  % 

31.3  % 

mchin 

100.0  % 

93.8  % 

68.8  % 

rmacdona 

81.3  % 

81.3  % 

37.5  % 

wgool 

40.6  % 

31.3  % 

37.5  % 
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Table  B.15  Speaker  verification  accuracy  when  the  subject’s  claimed  identity  is  his  true 


identity  (ten  dimensions). 


Claimed  Identity  is  the  True  Identity  (True  Accept  Accuracy)  | 

Claimed  ID 

Min.error 

Eigenvalue 

FoM 

cmartin 

100.0  % 

50.0  % 

25.0  % 

dprescot 

100.0  % 

75.0  % 

100.0  % 

jkeller 

100.0  % 

25.0  % 

j  miller 

100.0  % 

0.0% 

100.0  % 

jtreleav 

100.0  % 

100.0  % 

kmccrae 

100.0  % 

100.0  % 

100.0  % 

mchin 

100.0  % 

0.0% 

25.0  % 

rmacdona 

75.0  % 

0.0% 

75.0  % 

wgool 

25.0  % 

100.0  % 

100.0  % 

Table  B.16  Speaker  verification  accuracy  when  the  subject’s  claimed  identity  is  not  his 


true  identity  (ten  dimensions). 


Claimed  Identity  is  Not  True  Identity  (True  Reject  Accuracy) 

Claimed  ID 

Min-error 

Eigeui~lue 

FoM 

cmartin 

53.1  % 

75.0  % 

68.8  % 

dprescot 

50.0  % 

87.5  % 

84.4  % 

jkeller 

59.4  % 

87.5  % 

j  miller 

78.1  % 

78.1  % 

37.5  % 

jtreleav 

81.3  % 

25.0  % 

25.0  % 
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Abstract 

In  this  research,  face  recognition  and  speaker  identification  systems  are  each  converted  into  verification  systems. 
The  two  verification  systems  are  then  fused  to  form  a  single  identity  verification  system.  Finally,  the  use  of  the 
Karhunen-Loeve  Transform  (KLT)  for  dimensional  reduction  is  examined  for  suitability  in  the  verification  task. 

The  base  face  recognition  system  used  the  KLT  for  feature  reduction  and  a  back-propagation  neural  net  for 
classification.  Verification  involved  training  a  net  for  each  individual  in  the  database  for  two  classes  of  outputs, 
‘Joe’  or  ‘not  Joe.’  The  base  speaker  identification  system  used  Cepstral  analysis  for  feature  extraction  and 
a  distortion  measure  for  classification.  Verification  in  this  case  involved  performing  the  KLT  on  the  Cepstral 
coefficients  and  then  classifying  using  a  two-class  neural  net  for  each  individual,  similarly  to  the  face  verifier 
implementation. 

KLT  feature  reduction  is  compared  to  alternative  linear  and  non-linear  methods,  and  the  KLl’  is  found  to 
provide  superior  performance.  The  fusion  of  the  two  base  verification  systems  is  shown  to  provide  superior 
performance  over  either  system  alone. 
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